Compositions, organisms and methodologies employing a novel human kinase

ABSTRACT

This invention provides compositions, organisms and methodologies employing a novel human protein kinase, NRHK1. The novel protein kinase is encoded by a human gene comprising 21 exons. The human gene is localized in or near the 9q34 locus of human chromosome 9. The sequence similarity between the novel human protein and the consensus sequence of NIMA-related kinase indicates that the novel human protein may function as an NIMA-related kinase.

[0001] The present application incorporates by reference U.S. Provisional Application Serial No. 60/417,155 filed Oct. 10, 2002 and entitled “Composition, Organisms and Methodologies Employing a Novel Human Kinase.”

FIELD OF THE INVENTION

[0002] The present invention relates to compositions, organisms and methodologies employing a novel human protein kinase, NIMA-related human kinase 1 (NRHK1), which has sequence homology to the catalytic domain of tyrosine protein kinases and serine/threonine protein kinases, and is distantly related to NIMA-related protein kinases. This invention can be used for diagnosing, prognosing and treating kinase-related diseases and, in particular, diseases associated with aberrant expression of NRHK1.

BACKGROUND OF THE INVENTION

[0003] Protein kinases regulate many different cell proliferation, differentiation, and signaling processes by adding phosphate groups to proteins. Uncontrolled signaling has been implicated in a variety of disease conditions including inflammation, cancer, arteriosclerosis, and psoriasis. Reversible protein phosphorylation is the main strategy for controlling activities of eukaryotic cells. It is estimated that more than 1,000 of the 10,000 proteins active in a typical mammalian cell are phosphorylated. The high energy phosphate, which drives activation, is generally transferred from adenosine triphosphate molecules (ATP) to a particular protein by protein kinases and removed from that protein by protein phosphatases. Phosphorylation occurs in response to extracellular signals (hormones, neurotransmitters, growth and differentiation factors, etc.), cell cycle checkpoints, and environmental or nutritional stresses. The phosphorylation process is roughly analogous to turning on a molecular switch. When the switch goes on, the appropriate protein kinase activates a metabolic enzyme, regulatory protein, receptor, cytoskeletal protein, ion channel or pump, or transcription factor.

[0004] The kinases comprise the largest known protein group, a superfamily of enzymes with widely varied functions and specificities. They are usually named after their substrate, their regulatory molecules, or some aspect of a mutant phenotype. With regard to substrates, the protein kinases may be roughly divided into two groups: those that phosphorylate tyrosine residues (protein tyrosine kinases, PTK) and those that phosphorylate serine or threonine residues (serine/threonine kinases, STK). A few protein kinases have dual specificity and phosphorylate threonine and tyrosine residues. Almost all kinases contain a similar 250-300 amino acid catalytic domain. The primary structure of the kinase domains is conserved and can be further subdivided into 11 subdomains. The N-terminal of the kinase domain, which contains subdomains I-IV, generally folds into a lobe-like structure that binds and orients the ATP (or GTP) donor molecule. The C terminal of the kinase domain forms a larger lobe, which contains subdomains VI-XI, binds the protein substrate and carries out the transfer of the gamma phosphate from ATP to the hydroxyl group of a serine, threonine, or tyrosine residue. Subdomain V spans the two lobes. Each of the 11 subdomains contains specific residues and motifs or patterns of amino acids that are characteristic of that subdomain and are highly conserved.

[0005] The kinases may be categorized into families by the different amino acid sequences (generally between 5 and 100 residues) located on either side of, or inserted into loops of, the kinase domain. These added amino acid sequences allow the regulation of each kinase as it recognizes and interacts with its target protein.

[0006] The presence of a phosphate moiety modulates protein function in multiple ways. A common mechanism involves changes in the catalytic properties (Vmax and Km) of an enzyme, leading to its activation or inactivation.

[0007] A second widely recognized mechanism involves promoting protein-protein interactions. An example of this is the tyrosine autophosphorylation of the ligand-activated EGF receptor tyrosine kinase. This event triggers the high-affinity binding to the phosphotyrosine residue on the receptor's C-terminal intracellular domain to the SH2 motif of an adaptor molecule Grb2. Grb2, in turn, binds through its SH3 motif to a second adaptor molecule, such as SHC. The formation of this complex activates the signaling events that are responsible for the biological effects of EGF. Serine and threonine phosphorylation events also have been recently recognized to exert their biological function through protein-protein interaction events that are mediated by the high-affinity binding of phosphoserine and phosphothreonine to the WW motifs present in a large variety of proteins.

[0008] A third important outcome of protein phosphorylation is changes in the subcellular localization of the substrate. As an example, nuclear import and export events in a large diversity of proteins are regulated by protein phosphorylation.

[0009] Many kinases are involved in regulatory cascades wherein their substrates may include other kinases whose activities are regulated by their phosphorylation state. Ultimately the activities of some downstream effectors are modulated by phosphorylation resulting from activation of such a pathway.

SUMMARY OF THE INVENTION

[0010] The present invention discloses compositions, organisms and methodologies employing a novel human protein kinase. The new human protein kinase shares sequence homology with the consensus sequence of the catalytic domains of serine/threonine protein kinases, pkinases, and tyrosine kinases. The new protein also shows homologies to the kinase domain of an NIMA-related kinase. The gene encoding this protein is localized near locus 9q34 of human chromosome 9. This new gene is hereinafter referred to as NIMA-related human kinase 1 (NRHK1) gene, and its encoded protein(s) is referred to as NRHK1 or NRHK1 kinase.

[0011] The kinase domain in NRHK1 shows 100% sequence alignment with the consensus sequences of the catalytic domains of serine/threonine protein kinases, 100% sequence alignment with the consensus sequence of the pkinase domain, 87.5% sequence alignment with the consensus sequences of the catalytic domain of tyrosine kinases, and 28% sequence alignment with the kinase domain of an NIMA-related protein kinase. The utilities of various kinase domains are known in the art. The unique peptide sequences, and nucleic acid sequences that encode the peptides, can be used as models for the development of human therapeutic targets, aid in the identification of therapeutic proteins, and serve as targets for the development of human therapeutic agents that modulate kinase activity in cells and tissues that express the kinase.

[0012] In one aspect, the invention provides isolated polynucleotides comprising a nucleotide sequence encoding NRHK1 or a variant of NRHK1.

[0013] In another aspect, the invention provides isolated polypeptides comprising the amino acid sequence of NRHK1 or a variant of NRHK1.

[0014] In yet another aspect, the invention provides agents that modulate the expression level of the NRHK1 gene or an activity of NRHK1.

[0015] The invention also provides methods for (a) detecting polynucleotides comprising a nucleotide sequence encoding NRHK1 or a variant of NRHK1 and (b) detecting polypeptides comprising an amino acid sequence of NRHK1 or a variant of NRHK1 in a biological sample.

[0016] The invention further provides methods for screening agents that modulate expression level of the NRHK1 gene or an activity of NRHK1.

[0017] The invention further provides cell lines harboring the NRHK1 gene, animals transgenic for the NRHK1 gene, and animals with interrupted NRHK1 gene (NRHK1 knockout animals). These cell lines and animals can be used to study the functions of NRHK1.

[0018] In still another aspect, the invention provides polynucleotides capable of inhibiting NRHK1 gene expression by RNA interference.

[0019] The invention further provide methods of inhibiting NRHK1 gene expression by introducing siRNAs or other RNAi sequences into target cells.

[0020] The preferred embodiments of the inventions are described below in the Detailed Description of the Invention. Unless specifically noted, it is intended that the words and phrases in the specification and claims be given the ordinary and accustomed meaning to those of ordinary skill in the applicable art or arts. If any other meaning is intended, the specification will specifically state that a special meaning is being applied to a word or phrase.

[0021] It is further intended that the inventions not be limited only to the specific structure, material or methods that are described in the preferred embodiments, but include any and all structures, materials or methods that perform the claimed function, along with any and all known or later-developed equivalent structures, materials or methods for performing the claimed function.

[0022] Further examples exist throughout the disclosure, and it is not applicant's intention to exclude from the scope of his invention the use of structures, materials, or methods that are not expressly identified in the specification, but nonetheless are capable of performing a claimed function.

BRIEF DESCRIPTION OF THE DRAWINGS

[0023] The inventions of this application are better understood in conjunction with the following drawings, in which:

[0024]FIG. 1 compares amino acid residues 28 to 297 of NRHK1 to the catalytic domain of a family of Ser/Thr protein kinases.

[0025]FIG. 2 shows the sequence alignment between amino acid residues 28 to 296 of NRHK1 and the protein kinase domain of pkinases.

[0026]FIG. 3 shows the sequence alignment between amino acid residues 59 to 294 of NRHK1 and the catalytic domain of a family of tyrosine kinases.

[0027]FIG. 4 illustrates the sequence alignment between amino acid residues 25 to 285 of NRHK1 and amino acid residues 1 to 246 of an NIMA-related protein kinase.

[0028]FIG. 5 shows the hydrophobicity profile of NRHK1.

DETAILED DESCRIPTION OF THE INVENTION

[0029] The following detailed description is presented to enable any person skilled in the art to make and use the invention. For purposes of explanation, specific nomenclature is set forth to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that these specific details are not required to practice the invention. Descriptions of specific applications are provided only as representative examples. Various modifications to the preferred embodiments will be readily apparent to one skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. The present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest possible scope consistent with the principles and features disclosed herein.

[0030] The present invention is based on the sequence information obtained from a newly developed genomic prediction pipeline. Briefly, the X-ray crystal structures of the catalytic domains of protein kinases were collected and aligned together according to their structural identity/similarities. The alignment was converted into a “scoring matrix” which carried the structural profile of the kinase catalytic domains. This scoring matrix was then used to search the Celera Human Genome database and pull out sequences that have kinase catalytic domains.

[0031] Based on this analysis, the present invention provides the amino acid sequence of a human kinase peptide containing a kinase domain that is highly homologous to the consensus sequences of the catalytic domain of serine/threonine protein kinases, cDNA sequences and genomic sequences that encode the kinase peptide, and information about the closest art known protein/peptide/domain that has structural or sequence homology to the kinase of the present invention.

[0032] The peptide of the present invention may be used for the development of commercially important products and services. Various aspects of the invention are described in detail in the following subsections. The use of subsections is not meant to limit the invention. Each subsection applies to any aspect of the invention.

[0033] Definitions and Terms

[0034] To facilitate the understanding of the present invention, a number of terms and phrases are defined below:

[0035] As used herein, a polynucleotide or a polypeptide is “isolated” if it is removed from its native environment. For instance, a polynucleotide or a polypeptide is isolated through a purification process such that the polynucleotide or polypeptide is substantially free of cellular material or free of chemical precursors. The polynucleotide/polypeptide of the present invention can be purified to homogeneity or other degrees of purity. The level of purification will be based on the intended use. As appreciated by one of ordinary skill in the art, a polynucleotide/polypeptide can perform its desired function(s) even in the presence of considerable amounts of other components or molecules.

[0036] In some uses, a polynucleotide/polypeptide that is “substantially free of cellular material” includes preparations which have less than about 30% (by weight) other polynucleotides/polypeptides including contaminating polynucleotides/polypeptides. For instance, the preparations can have less than about 20%, less than about 10%, or less than about 5% other polynucleotides/polypeptides. If a polynucleotide/polypeptide preparation is recombinantly produced, it can be substantially free of culture medium, i.e., culture medium components representing less than about 20% by weight of the polynucleotide/polypeptide preparation.

[0037] The language “substantially free of chemical precursors” includes preparations in which the polynucleotide/polypeptide is separated from chemical precursors or other chemicals that are involved in the synthesis of the polynucleotide/polypeptide. In one embodiment, the language “substantially free of chemical precursors” includes kinase preparations having less than about 30% (by weight), less than about 20% (by weight), less than about 10% (by weight), or less than about 5% (by weight) chemical precursors or other chemicals used in the synthesis.

[0038] As used in the present invention, a polynucleotide introduced into a cell is an isolated polynucleotide. Likewise, a polypeptide expressed from an introduced vector in a cell is also an isolated polypeptide.

[0039] A “polynucleotide” can include any number of nucleotides. For instance, a polynucleotide can have at least 10, 20, 25, 30, 40, 50, 100 or more nucleotides. A polynucleotide can be DNA or RNA, double-stranded or single-stranded. A polynucleotide encodes a polypeptide if the polypeptide is capable of being transcribed and/or translated from the polynucleotide. Transcriptional and/or translational regulatory sequences, such as promoter and/or enhancer(s), can be added to the polynucleotide before said transcription and/or translation occurs. Moreover, if the polynucleotide is singled-stranded, the corresponding double-stranded DNA containing the original polynucleotide and its complementary sequence can be prepared before said transcription and/or translation.

[0040] As used herein, “a variant of a polynucleotide” refers to a polynucleotide that differs from the original polynucleotide by one or more substitutions, additions, and/or deletions. For instance, a variant of a polynucleotide can have 1, 2, 5, 10, 15, 20, 25 or more nucleotide substitutions, additions or deletions. Preferably, the modification(s) is inframe, i.e., the modified polynucleotide can be transcribed and translated to the original or intended stop codon. If the original polynucleotide encodes a polypeptide with a biological activity, the polypeptide encoded by a variant of the original polynucleotide variants substantially retains such activity.

[0041] Preferably, the biological activity is reduced/enhanced by less than 50%, or more preferably, less than 20%, relative to the original activity.

[0042] A variant of a polynucleotide can be a polynucleotide that is capable of hybridizing to the original polynucleotide, or the complementary sequence thereof, under reduced stringent conditions, preferably stringent conditions, or more preferably, highly stringent conditions. Examples of conditions of different stringency are listed in Table 1. Highly stringent conditions are those that are at least as stringent as conditions A-F; stringent conditions are at least as stringent as conditions G-L; and reduced stringency conditions are at least as stringent as conditions M-R. As used in Table 1, hybridization is carried out under a given hybridization condition for about 2 hours, followed by two 15-minute washes under the corresponding washing condition(s). TABLE 1 Stringency Conditions Poly- Stringency nucleotide Hybrid Hybridization Wash Temp. Condition Hybrid Length (bp)¹ Temperature and Buffer^(H) and Buffer^(H) A DNA:DNA >50 65° C.; 1xSSC -or- 65° C.; 0.3xSSC 42° C.; 1xSSC, 50% formamide B DNA:DNA <50 T_(B)*; 1xSSC T_(B)*; 1xSSC C DNA:RNA >50 67° C.; 1xSSC -or- 67° C.; 0.3xSSC 45° C.; 1xSSC, 50% formamide D DNA:RNA <50 T_(D)*; 1xSSC T_(D)*; 1xSSC E RNA:RNA >50 70° C.; 1xSSC -or- 70° C.; 0.3xSSC 50° C.; 1xSSC, 50% formamide F RNA:RNA <50 T_(F)*; 1xSSC T_(f)*; 1xSSC G DNA:DNA >50 65° C.; 4xSSC -or- 65° C.; 1xSSC 42° C.; 4xSSC, 50% formamide H DNA:DNA <50 T_(H)*; 4xSSC T_(H)*; 4xSSC I DNA:RNA >50 67° C.; 4xSSC -or- 67° C.; 1xSSC 45° C.; 4xSSC, 50% formamide J DNA:RNA <50 T_(J)*; 4xSSC T_(J)*; 4xSSC K RNA:RNA >50 70° C.; 4xSSC -or- 67° C.; 1xSSC 50° C.; 4xSSC, 50% formamide L RNA:RNA <50 T_(L)*; 2xSSC T_(L)*; 2xSSC M DNA:DNA >50 50° C.; 4xSSC -or- 50° C.; 2xSSC 40° C.; 6xSSC, 50% formamide N DNA:DNA <50 T_(N)*; 6xSSC T_(N)*; 6xSSC O DNA:RNA >50 55° C.; 4xSSC -or- 55° C.; 2xSSC 42° C.; 6xSSC, 50% formamide P DNA:RNA <50 T_(P)*; 6xSSC T_(P)*; 6xSSC Q RNA:RNA >50 60° C.; 4xSSC -or- 60° C.; 2xSSC 45° C.; 6xSSC, 50% formamide R RNA:RNA <50 T_(R)*; 4xSSC T_(R)*; 4xSSC # T_(m) (° C.) = 81.5 + 16.6 (log₁₀Na⁺) + 0.41 (% G + C) − (600/N), where N is the number of bases in the hybrid, and Na⁺ is the concentration of sodium ions in the hybridization buffer (Na⁺ for 1xSSC = 0.165 M).

[0043] It will be appreciated by those of ordinary skill in the art that, as a result of the degeneracy of the genetic code, there are many polynucleotide variants that encode the same polypeptide. Some of these polynucleotide variants bear minimal sequence homology to the original polynucleotide. Nonetheless, polynucleotides that vary due to differences in codon usage are specifically contemplated by the present invention.

[0044] As used herein, a “polypeptide” can include any number of amino acid residues. For instance, a polypeptide can have at least 5, 10, 15, 20, 30, 40, 50 or more amino acid residues.

[0045] As used herein, a “variant of a polypeptide” is a polypeptide that differs from the original polypeptide by one or more substitutions, deletions, and/or insertions. Preferably, these modifications do not substantially change (e.g., reduce or enhance) the original biological function of the polypeptide. For instance, a variant can reduce or enhance or maintain the biological activities of the original polypeptide. Preferably, the biological activities of the variant is reduced or enhanced by less than 50%, or more preferably, less than 20%, relative to the original polypeptide.

[0046] Similarly, the ability of a variant to react with antigen-specific antisera can be enhanced or reduced by less than 50%, preferably less than 20%, relative to the original polypeptide. These variants can be prepared and evaluated by modifying the original polypeptide sequence and then determining the reactivity of the modified polypeptide with the antigen-specific antibodies or antisera.

[0047] Preferably, a variant polypeptide contains one or more conservative substitutions. A “conservative substitution” is one in which an amino acid is substituted for another amino acid which has similar properties, such that one skilled in the art would expect that the secondary structure and hydropathic nature of the substituted polypeptide will not be substantially changed. Conservative amino acid substitutions can be made on the basis of similarity in polarity, charge, solubility, hydrophobicity, hydrophilicity and/or the amphipathic nature of the residues. Negatively charged amino acids include aspartic acid and glutamic acid, and positively charged amino acids include lysine and arginine. Amino acids having uncharged polar head groups and similar hydrophilicity values include leucine, isoleucine and valine, or glycine and alanine, or asparagine and glutamine, or serine, threonine, phenylalanine and tyrosine. Other groups of amino acids that can produce conservative changes include: (1) ala, pro, gly, glu, asp, gln, asn, ser, thr; (2) cys, ser, tyr, thr; (3) val, ile, leu, met, ala, phe; (4) lys, arg, his; and (5) phe, tyr, trp, his. A polypeptide variant can also contain nonconservative changes.

[0048] Polypeptide variants can be prepared by the deletion and/or addition of amino acids that have minimal influence on the biological activity, immunogenicity, secondary structure and/or hydropathic nature of the polypeptide. Variants can be prepared by, for instance, substituting, modifying, deleting or adding one or more amino acids residues in the original sequence. Polypeptide variants preferably exhibit at least about 70%, more preferably at least about 90%, and most preferably at least about 95% sequence homology to the original polypeptide.

[0049] Polypeptide variants include polypeptides that are modified from the original polypeptides either by a natural process, such as a post-translational modification, or by a chemical modification. These modifications are well known in the art. Modifications can occur anywhere in the polypeptide, including the backbone, the amino acid side-chains and the amino or carboxyl termini. It will be appreciated that the same type of modification can be present in the same or varying degrees at several sites in a given polypeptide. Also, a given polypeptide can contain many types of modifications. Polypeptides may be branched, for example, as a result of ubiquitination, and they may be cyclic, with or without branching. Cyclic, branched, and branched cyclic polypeptides can result from natural post-translational processes or be made through synthetic methods. Suitable modifications for this invention include acetylation, acylation, ADP-ribosylation, amidation, covalent attachment of flavin, covalent attachment of a heme moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, covalent attachment of phosphatidylinositol, cross-linking, cyclization, disulfide bond formation, demethylation, formation of covalent cross-links, formation of cysteine, formation of pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodination, methylation, myristoylation, oxidation, pegylation, proteolytic processing, phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated addition of amino acids to proteins such as arginylation, and ubiquitination.

[0050] As used herein, the term “modulation” includes up-regulation, induction, stimulation, potentiation, inhibition, down-regulation or suppression, or relief of inhibition.

[0051] A nucleotide sequence is “operably linked” to another nucleotide sequence if the two sequences are placed into a functional relationship. For example, a coding sequence is operably linked to a 5′ regulatory sequence if the 5′ regulatory sequence can initiate transcription of the coding sequence in an in vitro transcription/translation system or in a host cell. “Operably linked” does not require that the DNA sequences being linked are contiguous to each other. Intervening sequences may exist between two operably linked sequences.

[0052] As used herein, a “disease-free” human refers to a human who does not have NRHK1 -related diseases. Disease-free cells, tissues or samples refer to cells, tissues or samples obtained from such disease free human(s).

[0053] A polynucleotide is “capable of hybridizing” to a gene if the polynucleotide can hybridize to at least one of the following sequences: (1) the sequence of an RNA transcript of the gene, (2) the complementary sequence of an RNA transcript of the gene, (3) the cDNA sequence of an RNA transcript of the gene, (4) the complementary sequence of the cDNA sequence of an RNA transcript of the gene, (5) a genomic sequence of the gene, and (6) the complementary sequence of a genomic sequence of the gene.

[0054] As used herein, sequence “identity” in an alignment can be determined by the standard protein-protein BLAST program (blastp), the standard nucleotide-nucleotide BLAST program (blastn) or the BLAST2 Sequence program. Suitable BLAST programs can be found at the web site maintained by the National Center of Biotechnology Information (NCBI), National Library of Medicine, Washington, D.C., USA.

[0055] Human NRHK1 Gene and NRHK1 Kinase

[0056] The present invention identifies a new human gene (NRHK1 gene) that encodes a protein containing sequences homologous to the consensus sequences of the catalytic domain of serine/threonine kinases, pkinases, and tyrosine kinases. The new protein is also distantly related to an NIMA-related protein kinase. The nucleotide sequence encoding NRHK1 and the amino acid sequence of NRHK1 are depicted in SEQ ID NOS:1 and 2, respectively. The NRHK1 gene is localized in locus 9q34 of human chromosome 9. Specifically, the NRHK1 gene is located between genes LOCl 57890 and LOC57109, and overlaps with gene LOC169436.

[0057] Human chromosome locus 9q34 and the neighboring regions have been associated with multiple diseases, including but are not limited to, colon cancer, prostate cancer, head neck squamous cell carcinoma, lung adenocarcinoma, bladder cancer, renal cell carcinoma, T cell non-Hodgkin's lymphoma, insulinoma, childhood adrenocortical tumors, chronic myeloid leukemia, tuberous sclerosis, the nail-patella syndrome, lattice dystrophy Type II, primary torsion dystonia, congenital generalized lipodystrophy, and juvenile amyotrophic lateral sclerosis.

[0058] Human NRHK1 gene has 21 exons. The exons are mapped to the nucleotide sequences of human chromosome 9 in Celera genomic database (SEQ ID NO:3). Exons 1-29, 31 and 32 are also mapped to nucleotides 110001 to 137201 in human chromosome 9 of the Entrez Human Genome Sequence Database maintained by NCBI. Table 2 lists the location of these 21 exons in the genomic sequence SEQ ID NO:3. Table 2 also illustrates the corresponding location of each exon in the NRHK1-coding sequence SEQ ID NO:1. TABLE 2 Exons in Human NRHK1 Gene Corresponding Sequence in Corresponding Sequence in Exon Numbers SEQ ID NO:3 SEQ ID NO:1 1  1-87  1-87 2 2563-2469  88-174 3 8734-8778 175-219 4 10495-10569 220-294 5 12325-12426 295-396 6 14404-14474 397-467 7 15547-15662 468-583 8 18710-18828 584-702 9 20019-20182 703-866 10 21583-21841  867-1125 11 23864-23951 1126-1213 12 24833-24949 1214-1330 13 25277-25418 1331-1472 14 26158-26298 1473-1613 15 26640-26725 1614-1699 16 27298-27432 1700-1834 17 27841-27930 1835-1924 18 28115-28243 1925-2053 19 28335-28463 2054-2182 20 29204-29344 2183-2323 21 29667-29836 2324-2493

[0059] A conserved domain search using RPS-BLAST program (RPS-BLAST 2.2.3 [Apr, 24, 2002], available at the BLAST web site maintained by NCBI), showed that NRHK1 contained sequences homologous to the consensus sequences of several protein kinase domains.

[0060] Specifically, the amino acid residues 28 to 297 of NRHK1 are highly homologous to a catalytic domain of a family of Ser/Thr protein kinases (Entrez accession number: smart00220). This kinase family includes C-Jun N-terminal kinase (JNK3), abelson tyrosine kinase, a calmodulin-binding, vesicle-associated, protein kinase-like protein (1G5), serine/threonine-protein kinase prp4, Cdc2/Cdc28 subfamily of Ser/Thr protein kinases in C. elegans, and ribosomal S6 kinase of C elegans. FIG. 1 shows that the two sequences share 100.0% alignment with a score of 130 bits and an E value of 2×10⁻³¹. As used in other figures of this invention, “Query” denotes to the sequence of NRHK1, and “Sbjct” refers to the sequence being compared to the NRHK1 sequence.

[0061]FIG. 2 shows that the amino acid residues 28 to 296 of NRHK1 also aligned 100.0% with the protein kinase domain of pkinase (Entrez accession number: pfam00069). The alignment has a score of 119 bits, and an E value of 3×10⁻²⁸. This pkinase family includes protein kinase Ck2, wee 1-like protein kinase (WEE1hu), and tyrosine-protein kinase RYK.

[0062]FIG. 3 shows the sequence alignment between amino acid residues 59 to 294 of NRHK1 and the N-terminal of a family of tyrosine kinases (Entrez accession number: smart00219). This family includes the tyrosine kinase domain of fibroblast growth factor receptor 1, tyrosine-protein kinase (KIN1 5/KIN16 subfamily), and a Drosophila receptor protein-tyrosine kinase family member (dr1-P1). The amino acid residues 59-294 in NRHK1's kinase domain have 87.5% sequence identities to smartO0219, with a score of 76.4 bits and an E value of 4×10⁻¹⁵.

[0063]FIG. 4 illustrates the sequence alignment between amino acid residues 25 to 285 of NRHK1 and the N-terminal of a Populus x canescens NIMA-related protein kinase (Entrez accession number: AF469649_(—)1). The two sequences share 28% sequence identities with a score of 87.8 bits and an E value of 5×10⁻¹⁶. NIMA-related kinases are a group of protein kinases sharing high amino acid sequence identities with NIMA (never in mitosis gene A) which control mitosis in Aspergillus nidulans. Specifically, NIMA regulates mitotic chromatin condensation through phosphorylation of histone H3 at serine 10. NIMA-related kinases may also regulate the cell cycle in other eukaryotes, as expression of NIMA can promote mitotic events in yeast, frog or human cells. Moreover, dominant-negative versions of NIMA can adversely affect the progression of human cells into mitosis. Human Nek6 (hNek6), a member of the mammalian NIMA-related kinases is localized to human chromosome 9q33-34, the same region in chromosome 9 where NRHK1 is located.

[0064]FIG. 5 shows the hydrophobicity profile of NRHK1. The hydrophobicity analysis indicates that NRHK1 kinase is not likely a membrane or transmembrane protein.

[0065] NRHK1 shows significant sequence homology to a human protein kinase-like protein SGK071 (Entrez accession number: AX056458, the nucleotide sequence and the amino acid sequence of SGK071 are depicted in SEQ ID NOS:5 and 6, respectively), which was disclosed in PCT patent application WO00/73469. Analysis using pairwise BLAST algorithm revealed that NRHK1 and SGK071 share 84% sequence identities at the amino acid level (blastp, matrix: BLOSUM62, gap open: 11, Gap extension: 1, x_dropoff: 50, expect: 10.0, wordsize: 3, filter: unchecked), and 90% sequence identities at nucleotide level (blastn, match: 1, mismatch: −2, gap open: 5, gap extension: 0, x_dropoff: 50, expect: 10.0, wordsize: 11, filter: unchecked).

[0066] The existence and expression of the NRHK1 gene in humans are supported by various EST sequences. For instance, nucleotides 1-583 of SEQ ID NO:1 are supported by the EST sequences disclosed under GenBank accession numbers BI458908, AL044935, BM015576, AI799141, and Incyte accession number 6854652H1; nucleotides 584-866 of SEQ ID NO:1 are supported by the EST sequences disclosed under GenBank accession numbers AV763896, AI936591, AW183662, and AI554166; nucleotides 867-949 of SEQ ID NO:1 are supported by the EST sequences disclosed under GenBank accession number AW629230 and Incyte accession numbers 6854652H1 and 6883792H1; nucleotides 1565-1707 of SEQ ID NO:1 are supported by the EST sequences disclosed under Incyte accession number 6140228F8; nucleotides 2426-2493 of SEQ ID NO:1 are supported by the EST sequences disclosed under GenBank accession number BQ184985.

[0067] Utilit of Protein Kinases

[0068] Protein kinases are involved in the regulation of many critical biological processes such as signal transduction pathways. Malfunctions of cellular signaling have been associated with many diseases. Regulation of signal transduction by cytokines and association of signal molecules with protooncogenes and tumor suppressor genes have been the subjects of intense research. Many therapeutic strategies can now be developed through the synthesis of compounds which activate or inactivate protein kinases.

[0069] The importance of kinases in the etiology of diseases has been well established. Kinase proteins are a major target for drug action and development. A January 2002 survey of ongoing clinical trials in the USA revealed more than 100 clinical trials involving the modulation of kinases. Trials are ongoing in a wide variety of therapeutic indications including asthma, Parkinson's, inflammation, psoriasis, rheumatoid arthritis, spinal cord injuries, muscle conditions, osteoporosis, graft versus host disease, cardiovascular disorders, autoimmune disorders, retinal detachment, stroke, epilepsy, ischemia/reperfusion, breast cancer, ovarian cancer, glioblastoma, non-Hodgkin's lymphoma, colorectal cancer, non-small cell lung cancer, brain cancer, Kaposi's sarcoma, pancreatic cancer, liver cancer, and other tumors. Numerous kinds of modulators of kinase activity are currently in clinical trials including antisense molecules, antibodies, small molecules, and even gene therapy. Accordingly, it is valuable to the field of pharmaceutical development to identify and characterize previously unknown members of the kinase family proteins. The present invention advances the state of the art by providing previously unidentified human kinase proteins which are structurally related to NIMA-related protein kinases.

[0070] Many therapeutic strategies are aimed at critical components in signal transduction pathways. Approaches for regulating kinase gene expression include specific antisense oligonucleotides for inhibiting post-transcriptional processing of the messenger RNA, naturally occurring products and their chemical derivatives to inhibit kinase activity and monoclonal antibodies to inhibit receptor linked kinases. In some cases, kinase inhibitors also allow other therapeutic agents additional time to become effective and act synergistically with current treatments.

[0071] Among the areas of pharmaceutical research that are currently receiving a great deal of attention are the role of phosphorylation in transcriptional control, apoptosis, protein degradation, nuclear import and export, cytoskeletal regulation, and checkpoint signaling. The accumulating knowledge about signaling networks and the proteins involved will be put to practical use in the development of potent and specific pharmacological modulators of phosphorylation-dependent signaling. The rational structure-based design and development of highly specific kinase modulators is becoming routine and drugs that intercede in signaling pathways are becoming a major class of drug. The functions of some of the kinases are described below.

[0072] The second messenger dependent protein kinases primarily mediate the effects of second messengers such as cyclic AMP (cAMP), cyclic GMP, inositol triphosphate, phosphatidylinositdl, 3,4,5-triphosphate, cyclic-ADPribose, arachidonic acid, diacylglycerol and calcium-calmodulin. The cyclic-AMP dependent protein kinases (PKA) are important members of the STK family. Cyclic-AMP is an intracellular mediator of hormone action in all prokaryotic and animal cells that have been studied. Such hormone-induced cellular responses include thyroid hormone secretion, cortisol secretion, progesterone secretion, glycogen breakdown, bone resorption, and regulation of heart rate and force of heart muscle contraction. PKA is found in all animal cells and is thought to account for the effects of cyclic-AMP in most of these cells. Altered PKA expression is implicated in a variety of disorders and diseases including cancer, thyroid disorders, diabetes, atherosclerosis, and cardiovascular disease.

[0073] Calcium-calmodulin (CaM) dependent protein kinases are also members of the STK family. Calmodulin is a calcium receptor that mediates many calcium regulated processes by binding to target proteins in response to the binding of calcium. The principle target protein in these processes is CaM dependent protein kinases. CaM-kinases are involved in regulation of smooth muscle contraction (MLC kinase), glycogen breakdown (phosphorylase kinase), and neurotransmission (CaM kinase I and CaM kinase II). CaM kinase 1 phosphorylates a variety of substrates including the neurotransmitter related proteins synapsin I and II, the gene transcription regulator, CREB, and the cystic fibrosis conductance regulator protein, CFTR. CaM II kinase also phosphorylates synapsin at different sites, and controls the synthesis of catecholamines in the brain through phosphorylation and activation of tyrosine hydroxylase. Many of the CaM kinases are activated by phosphorylation in addition to binding to CaM. The kinase may autophosphorylate itself, or be phosphorylated by another kinase as part of a “kinase cascade”.

[0074] Another ligand-activated protein kinase is 5′-AMP-activated protein kinase (AMPK). Mammalian AMPK is a regulator of fatty acid and sterol synthesis through phosphorylation of the enzymes acetyl-CoA carboxylase and hydroxymethylglutaryl-CoA reductase and mediates responses of these pathways to cellular stresses such as heat shock and depletion of glucose and ATP. AMPK is a heterotrimeric complex comprised of a catalytic alpha subunit and two non-catalytic beta and gamma subunits that are believed to regulate the activity of the alpha subunit. Subunits of AMPK have a much wider distribution in non-lipogenic tissues, such as brain, heart, spleen, and lung, than expected. This distribution suggests that AMPK's functions may extend beyond regulation of lipid metabolism alone.

[0075] The mitogen-activated protein kinases (MAP) are also members of the STK family. MAP kinases also regulate intracellular signaling pathways. They mediate signal transduction from the cell surface to the nucleus via phosphorylation cascades. Several subgroups have been identified, and each manifests different substrate specificities and responds to distinct extracellular stimuli. MAP kinase signaling pathways are present in mammalian cells as well as in yeast. The extracellular stimuli that activate mammalian pathways include epidermal growth factor (EGF), ultraviolet light, hyperosmolar medium, heat shock, endotoxic lipopolysaccharide (LPS), and pro-inflammatory cytokines, such as tumor necrosis factor (TNF) and interleukin-1 (IL-1).

[0076] EGF receptor is found in over half of breast tumors unresponsive to hormone. EGF is found in many tumors, and EGF may be required for tumor cell growth. Antibodies to EGF blocked the growth of tumor xenografts in mice. An antisense oligonucleotide for amphiregulin inhibited growth of a pancreatic cancer cell line.

[0077] Tamoxifen, a protein kinase C inhibitor with anti-estrogen activity, is currently a standard treatment for hormone-dependent breast cancer. The use of this compound may increase the risk of developing cancer in other tissues such as the endometrium. Raloxifene, a related compound, has been shown to protect against osteoporosis. The tissue specificity of inhibitors must be considered when identifying therapeutic targets.

[0078] Signal transduction to the nucleus in response to extracellular stimulus by a growth factor involves the mitogen activated protein (MAP) kinases. MAP kinases are a family of protein serine/threonine kinases which mediate signal transduction from extracellular receptors or heat shock, or UV radiation. Cell proliferation and differentiation in normal cells are under the regulation and control of multiple MAP kinase cascades. Aberrant and deregulated functioning of MAP kinases can initiate and support carcinogenesis. Insulin and IGF-1 also activate a mitogenic MAP kinase pathway that may be important in acquired insulin resistance occurring in type 2 diabetes.

[0079] Many cancers become refractory to chemotherapy by developing a survival strategy involving the constitutive activation of the phosphatidylinositol 3-kinase-protein kinase B/Akt signaling cascade. This survival signaling pathway thus becomes an important target for the development of specific inhibitors that would block its function. PI-3 kinase/Akt signaling is equally important in diabetes. The pathway activated by RTKs subsequently regulates glycogen synthase 3 (GSK3) and glucose uptake. Since AKT has decreased activity in type 2 diabetes, it provides a therapeutic target.

[0080] Protein kinase inhibitors provide much of our knowledge about in vivo regulation and coordination of physiological functions of endogenous peptide inhibitors. A pseudosubstrate sequence within PKC acts to inhibit the kinase in the absence of its lipid activator. A PKC inhibitor, such as chelerythrine, acts on the catalytic domain to block substrate interaction, while calphostin acts on the regulatory domain to mimic the pseudosubstrate sequence and block ATPase activity, or to inhibit cofactor binding.

[0081] Although some protein kinases have, to date, no known system of physiological regulation, many are activated or inactivated by autophosphorylation or phosphorylation by upstream protein kinases. The regulation of protein kinases also occurs during the transcription, post-transcription, and post-translation processes. The mechanism of posttranscriptional regulation is alternative splicing of precursor mRNA. For example, protein kinase C βI and βII are two isoforms of a single PKCβ gene derived from differences in the splicing of the exon encoding the C-terminal 50-52 amino acids. Splicing can be regulated by a kinase cascade in response to peptide hormones, such as insulin and IGF-1. PKC βI and βII have different specificities for phosphorylating members of the mitogen activated protein (MAP) kinase family, for glycogen synthase 3β, for nuclear transcription factors, such as TLS/Fus, and for other nuclear kinases. By inhibiting the post-transcriptional alternative splicing of PKC βII mRNA, PKC βII-dependent processes are inhibited.

[0082] The development of antisense oligonucleotides to inhibit the expression of various protein kinases has been successful. Antisense oligonucleotides are short lengths of synthetically manufactured, chemically modified DNA or RNA designed to specifically interact with mRNA transcripts encoding target proteins. The interaction of the antisense moiety with mRNA inhibits protein translation and, in some cases, post-transcriptional processing (e.g., alternative splicing and stability) of mRNA. Antisense oligonucleotides have been developed to alter alternative splicing of mRNA forms for inhibiting the translation of PKCα.

[0083] Protein kinase C isoforms have been implicated in cellular changes observed in the vascular complications of diabetes. Hyperglycemia is associated with increased levels of PKCα and β isoforms in renal glomeruli of diabetic rats. Oral administration of a PKCβ inhibitor prevented the increased mRNA expression of TGF-β1 and extracellular matrix component genes. Administration of the specific PKCβ inhibitor (LY333531) also normalized levels of cytokines, caldesmon, and hemodynamics of retinal and renal blood flow. Overexpression of the PKCβ isoform in the myocardium resulted in cardiac hypertrophy and failure. The use of LY33353 1 to prevent adverse effects of cardiac PKCβ overexpression in diabetic subjects is under investigation. The compound is also in Phase I/II clinical trials for diabetic retinopathy and diabetic macular edema indicating that it may be pharmacodynamically active.

[0084] PRK (proliferation-related kinase) is a serum/cytokine inducible STK that is involved in regulation of the cell cycle and cell proliferation in human megakaroytic cells. PRK is related to the polo (derived from human polo gene) family of STKs implicated in cell division. PRK is down-regulated in lung tumor tissue and may be a proto-oncogene whose deregulated expression in normal tissue leads to oncogenic transformation. Altered MAP kinase expression is implicated in a variety of disease conditions including cancer, inflammation, immune disorders, and disorders affecting growth and development.

[0085] DNA-dependent protein kinase (DNA-PK) is involved in the repair of double-strand breaks in mammalian cells. This enzyme requires ends of double stranded DNA or transitions from single-stranded to double-stranded DNA in order to act as a serine/threonine kinase. Cells with defective or deficient DNA-PK activity are unable to repair radiation induced DNA double-strand breaks and are consequently very sensitive to the lethal effects of ionizing radiation. Inhibition of DNA-PK has the potential to increase the efficacy of anti-tumor treatment with radiation or chemotherapeutic agents.

[0086] The cyclin-dependent protein kinases (CDKs) are another group of STKs that control the progression of cells through the cell cycle. Cyclins are small regulatory proteins that act by binding to and activating CDKs that then trigger various phases of the cell cycle by phosphorylating and activating selected proteins involved in the mitotic process. CDKs are unique in that they require multiple inputs to become activated. In addition to the binding of cyclin, CDK activation requires the phosphorylation of a specific threonine residue and the dephosphorylation of a specific tyrosine residue.

[0087] Cellular inhibitors of CDKs also play a major role in cell cycle progression. Alterations in the expression, function, and structure of cyclin and CDK are encountered in the cancer phenotype. Therefore, CDKs may be important targets for new cancer therapeutic agents.

[0088] Chemotherapy resistant cells tend to escape apoptosis. Under certain circumstances, inappropriate CDK activation may even promote apoptosis by encouraging the progression of the cell cycle under unfavorable conditions, i.e., attempting mitosis while DNA damage is largely unrepaired.

[0089] Purines and purine analogs act as CDK inhibitors. Flavopiridol is a flavonoid that causes 50% growth inhibition of tumor cells at 60 nM. It also inhibits EGFR and protein kinase A. Flavopiridel induces apoptosis and inhibits lymphoid, myeloid, colon, and prostate cancer cells grown in vivo as tumor xenografts in nude mice.

[0090] Staurosporine and its derivative, UCN-01, in addition to inhibiting protein kinase C, inhibit cyclin B/CDK (IC50=3 to 6 nM). Staurosporine is toxic, but its derivative 7-hydroxystaurosporine (UCN-0 1) has anti-tumor properties and is in clinical trials. UCN-01 affects the phosphorylation of CDKs and alters the cell cycle checkpoint functioning. These compounds illustrate that multiple intracellular targets may be affected as the concentration of an inhibitor is increased within cells.

[0091] Protein tyrosine kinases, PTKs, specifically phosphorylate tyrosine residues on their target proteins and may be divided into transmembrane, receptor PTKs and non-transmembrane, non-receptor PTKs. Transmembrane protein tyrosine kinases are receptors for most growth factors. Binding of a growth factor to the receptor activates the transfer of a phosphate group from ATP to selected tyrosine side chains of the receptor and other specific proteins. Growth factors (GF) associated with receptor PTKs include; epidermal GF, platelet-derived GF, fibroblast GF, hepatocyte GF, insulin and insulin-like GFs, nerve GF, vascular endothelial GF, and macrophage colony stimulating factor.

[0092] Since RTKs stimulate tumor cell proliferation, inhibitors of RTKs may inhibit the growth and proliferation of such cancers. Inhibitors of RTKs are also useful in preventing tumor angiogenesis and can eliminate support from the host tissue by targeting RTKs located on vascular cells, such as blood vessel endothelial cells and stromal fibroblasts. For example, VEGF stimulates endothelial cell growth during angiogenesis, and increases the permeability of tumor vasculature so that proteins and other growth factors become accessible to the tumor. Broad-spectrum antitumor efficacy of an oral dosage form of an inhibitor of VEGF signaling has been reported. Thus, inhibition of VEGF receptor signaling presents an important therapeutic target. An extracellular receptor can also be a target for inhibition. For example, the EGF receptor family and its ligands are overexpressed and exist as an autocrine loop in many tumor types.

[0093] Increasing knowledge of the structure and activation mechanism of RTKs and the signaling pathways controlled by tyrosine kinases provided the possibility for the development of target specific drugs and new anti-cancer therapies. Approaches towards the prevention or interception of deregulated RTK signaling include the development of selective components that target either the extracellular ligand-binding domain or the intracellular substrate binding region.

[0094] The most successful strategy to selectively kill tumor cells is the use of monoclonal antibodies (mAbs) that are directed against the extracellular domain of RTKs, which are critically involved in cancer and are expressed at the surface of tumor cells. In the past years, recombinant antibody technology has made an enormous progress in the design, selection and production of newly engineered antibodies. It is also possible to generate humanized antibodies, human-mouse chimeric or bispecific antibodies for targeted cancer therapy. Mechanistically, anti-RTK mAbs might work by blocking the ligandreceptor interaction and therefore inhibiting ligand-induced RTK signaling and increasing RTK down-regulation and internalization. In addition, binding of mAbs to certain epitopes on the cancer cells may induce immune-mediated responses, such as opsonization and complement-mediated lysis, and trigger antibody-dependent cellular cytotoxicity by macrophages or natural killer cells. In recent years, it became evident that mAbs control tumor growth by altering the intracellular signaling pattern inside the targeted tumor cell, leading to growth inhibition and/or apoptosis. In addition, bispecific antibodies can bridge selected surface molecules on a target cell with receptors on an effector cell, thus triggering cytotoxic responses against the target cell. Despite the toxicity that has been seen in clinical trials of bispecific antibodies, advances in antibody engineering, characterization of tumor antigens and immunology might help to produce rationally designed bispecific antibodies for anti-cancer therapy.

[0095] Another promising approach to inhibiting aberrant RTK signaling is to develop small molecule drugs that selectively interfere with the intrinsic tyrosine kinase activity and thereby block receptor autophosphorylation and activation of downstream signal transducers. The tyrphostins, which belong to the quinazolines, are one important group of such inhibitors that compete with ATP for the ATP binding site at the receptor's tyrosine kinase domain and some members of the group have been shown to specifically inhibit the EGFR. Potent and selective inhibitors of receptors involved in neovascularization have been developed and are now undergoing clinical evaluation. New classes of tyrosine kinase inhibitors (TKIs) with increased potency and selectivity, higher in vitro and in vivo efficacy and decreased toxicity have been developed using the advantages of structure-based drug design, crystallographic structure information, combinatorial chemistry and high-throughput screening.

[0096] Recombinant immunotoxins provide another possibility of target-selective drug design. Recombinant immunotoxins are composed of a bacterial or plant toxin either fused or chemically conjugated to a specific ligand, such as the variable domains of the heavy and light chains of mAbs or to a growth factor. Immunotoxins may contain bacterial toxins, such as Pseudomouas exotoxin A or diphtheria toxin, or plant toxins, such as ricin A or clavin. These recombinant molecules can selectively kill their target cells when internalized after binding to cell surface receptors of the target cells.

[0097] The use of antisense oligonucleotides represents another strategy to inhibit the activation of receptor protein kinases (RTKs). Antisense oligonucleotides are short pieces of synthetic DNA or RNA that are designed to interact with the mRNA to block the transcription and thus the expression of the target proteins. Antisense oligonucleotides interact with the mRNA by Watson-Crick base-pairing and are therefore highly specific to the target protein. Several preclinical and clinical studies suggest that antisense therapy might be therapeutically useful for the treatment of solid tumors.

[0098] The potential of RTKs and their relevant signaling as selective anti-cancer targets for therapeutic intervention has been recognized. As a consequence, a variety of successful target specific drugs such as niAbs and RTK inhibitors have been developed and are currently being evaluated in clinical trials. Table 3 summarizes the most successful drugs against receptor tyrosine kinase signaling which are currently evaluated in clinical phases or have already been approved by the FDA. TABLE 3 RTK Drugs Currently Under Clinical Evaluation RTK Drug Company Description Status EGFR ZA18539 Iressa AstraZeneca TKI that inhibits EGFR Phase III signaling EGFR Cetuximab C225 ImClone Mab directed against EGFR Phase III Systems EGFR EGF fusion protein Seragen Recombinant diphtheria Phase II toxin-hEGF fusion protein HER2 Trastuzumab Genetech Mab directed against HER2 Approved Herceptin by the FDA in 1998 IGF-IR INX-4437 INEX USA Antisense oligonucleotides Phase I targeting IGR-IR VEGFR SU5416 SUGEN TKI that inhibits VEGFR2 Phase II VEGFR/ SU6668 SUGEN RTK inhibition of VEGFR, Phase I FGFR/ FGFR, and PDGFR PDGFR

[0099] Non-receptor PTKs lack transmembrane regions and, instead, form complexes with the intracellular regions of cell surface receptors. Receptors that function through non-receptor PTKs include those for cytokines, hormones (growth hormone and prolactin) and antigen-specific receptors on T and B lymphocytes.

[0100] Many of the PTKs were first identified as the products of mutant oncogenes in cancer cells where their activation was no longer subject to normal cellular controls. In fact, about one third of the known oncogenes encode PTKs, and it is well known that cellular transformation (oncogenesis) is often accompanied by increased tyrosine phosphorylation activity.

[0101] Many tyrosine kinase inhibitors, such as flavopiridol, genistem, erbstatin, lavendustin A, staurosporine, and UCN-01, are derived from natural products. Inhibitors directed to the ATP binding site are also available. Signals from RTKs can also be inhibited at other target sites such as nuclear tyrosine kinases, membrane anchors (inhibition of famesylation) and transcription factors.

[0102] Targeting the signaling potential of growth promoting tyrosine kinases such as EGFR, HER2, PDGFR, src, and abl, will block tumor growth while blocking IGF-I and TRK will interfere with tumor cell survival. Inhibition of these kinases will lead to tumor shrinkage and apoptosis. FklI/KDR and src are kinases necessary for neovascularization (angiogenesis) of tumors. Inhibition of these kinases will slow tumor growth and decrease metastases.

[0103] Inhibitors of RTKs suppress tumor development by preventing cell migration, invasion and metastases. These drugs are likely to increase the time required for tumor progression, and may inhibit or attenuate the aggressiveness of the disease but may not initially result in measurable tumor regression.

[0104] An example of cancer arising from a defective tyrosine kinase is a class of ALK positive lymphomas referred to as “ALKomas” which display inappropriate expression of a neural-specific tyrosine kinase, anaplastic lymphoma kinase (ALK).

[0105] Iressa (ZD1839) is an orally active selective EGF-R inhibitor. This compound disrupts signaling involved in cancer cell proliferation. The clinical efficacy of this agent shows that it is well tolerated by patients undergoing Phase I/II clinical trials. The compound has shown promising cytotoxicity towards several cancer cell lines.

[0106] Since the majority of protein kinases are expressed in the brain, often in a neuron-specific fashion, protein phosphorylation must play a key role in the development and function of the vertebrate central nervous system. Thus neuron-specific kinases are well established as targets for the development of pharmacologically active modulators.

[0107] NIMA-related kinases are a group of protein kinases sharing high amino acid sequence identities with NIMA (never in mitosis gene A), a protein kinase that, in corporation with p34Cdc2 protein kinase, regulate mitosis in Aspergillus nidulans. Both p34cdc2/cyclin B and NIMA have to be correctly activated before mitosis can be initiated in this species, and p34cdc2/cyclin B plays a role in the mitosis-specific activation of NIMA. In addition, both kinases have to be proteolytically destroyed before mitosis can be completed. NIMA regulates mitotic chromatin condensation through phosphorylation of histone H3 at serine 10. NIMA-related kinases may also regulate the cell cycle in other eukaryotes, as expression of NIMA can promote mitotic events in yeast, frog or human cells. In Xenopus oocytes, NIMA induces germinal vesicle breakdown without activating Mos, CDC2, or MAP kinase. In HeLa cells, dominant-negative versions of NIMA adversely affects the progression into mitosis by causing a specific G2 arrest.

[0108] Mutation of threonine 199 (conserved in all NIMA-related kinases) inhibited NIMA beta-casein kinase activity and abolished its in vivo function. This site conforms to a minimal consensus phosphorylation site for NIMA (FXXT) and is analogous to the autophosphorylation site of cyclic-AMP-dependent protein kinases. Mutations in an NIMA-related kinase gene, Nek1, cause pleiotropic effects including a progressive polycystic kidney disease in mice.

[0109] Within the human NIMA-related kinase (Nek) family, the Nek2 protein kinase is the closest known mammalian relative of the NIMA of Aspergillus nidulans. The two kinases share 47% sequence identity over their catalytic domains and display a similar cell cycle-dependent expression peaking at the G2 to M phase transition. It was found that recombinant Nek2 is active as a serine/threonine protein kinase and may undergo autophosphorylation. Both human Nek2 and fungal NIMA phosphorylate a similar, albeit not identical, set of proteins and synthetic peptides, and beta-casein was found to be a suitable substrate for assaying Nek2 in vitro. In HeLa cells, the Nek2 activity parallels its abundance, being low during M and G1 but high during S and G2 phase. Human Nek2 localizes to the centrosome. Overexpression of active Nek2 induces a striking splitting of centrosomes, whereas prolonged expression of either active or inactive Nek2 leads to dispersal of centrosomal material and loss of a focused microtubule-nucleating activity. These results indicate that one function of human Nek2 relates to the centrosome cycle.

[0110] Human Nek3 was cloned by RT-PCR. A high level of Nek3 expression was detected in testis, ovary, and brain, with low-level expression being detected in most of the other tissues. Nek3 mRNA was detected in all the proliferating cell lines studied, and the amount did not change during the cell cycle. The human Nek3 gene was assigned to human chromosome 13 by somatic cell hybrids and 13q14.2 by radiation hybrid mapping.

[0111] Human Nek6 is localized to human chromosome 9q33-34, the same region where NRHK1 is located. Nek6 transcripts appear to be expressed in most of the tissues.

[0112] Human Nek7 is 77% identical to human Nek6. Phylogenetic analysis suggests that Nek6, Nek7, and C. elegans F19H6. 1 constitute a subfamily within the NIMA family of protein kinases. Nek7 expression was restricted to a subset of tissues containing lung, muscle, testis, brain, heart, liver, leukocyte, and spleen. The human NEK7 gene was assigned to human chromosome 1 by somatic cell hybrids and 1q31.3 by radiation hybrid mapping.

[0113] Recently, human Nek8, a new human NIMA-related kinase, and its candidate substrate Bicd2 were isolated and cloned. Nek8 contains an N-terminal catalytic domain homologous to the Nek family of protein kinases, a central domain with homology to RCC1, a guanine nucleotide exchange factor for the GTPase Ran, and a C-terminal coiled-coil domain. Like Nek2, Nek8 prefers beta-casein over other exogenous substrates, has shared biochemical requirements for kinase activity, and is capable of autophosphorylation and oligomerization. Nek8 activity is not cell cycle regulated, but like Nek3, levels are consistently higher in G(0)-arrested cells. Bicd2 is a protein co-chromatographed with Nek8 activity. This protein is a human homolog of the Drosophila protein Bicaudal D, a coiled-coil protein. Bicd2 is phosphorylated by Nek8 in vitro, and the endogenous proteins associate in vivo. Bicd2 localizes to cytoskeletal structures, and its subcellular localization is dependent on microtubule morphology. Treatment of cells with nocodazole leads to dramatic reorganization of Bicd2, and correlates with Nek8 phosphorylation. It thus appears that both Nek8 and Bicd2 associated with cell cycle independent microtubule dynamics.

[0114] In summary, kinase proteins are a major target for drug action and development.

[0115] Accordingly, it is valuable to the field of pharmaceutical development to identify and characterize previously unknown members of kinase proteins. The present invention advances the state of the art by providing a previously unidentified human kinase protein that has sequence and structure similarities to the catalytic domain of protein kinases. Specifically, the kinase domain in NRHK1 shares high sequence identities with the corresponding domains in serine/threonine kinases, pkinases, and a family of tyrosine kinases. The kinase domain in NRHK1 is also distantly related to the sequences of NIMArelated kinases. Functional studies in diverse species have implicated NIMA-related kinases in G(2)/M progression, chromatin condensation and centrosome regulation. The kinase domain in NRHK1, either in its native form or in a mutant form, can be used to affect the function of the corresponding domain in other kinases. This domain can also be used to phosphorylate suitable substrates. The substrate peptides can be conjugated to antibodies, and the phosphate groups added to the substrate peptides can be radioactively or fluorescently labeled. Antibodies thus labeled can be used in various detection assays, as appreciated by one of skill in the art.

[0116] NRHK1 gene and the gene product can be used as a molecular marker for diagnosing, prognosing, and monitoring the treatment of disorders related to the aberrant expression of NRHK1. In addition, the NRHK1 gene can be used to screen for potential agents or drugs capable of enhancing or inhibiting the NRHK1 gene expression in human cells. The NRHK1 gene products (polynucleotide and polypeptide) can be used to screen for potential agents or drugs capable of enhancing or inhibiting NRHK1 activity. Furthermore, various therapeutic methods for treating disorders related to the aberrant expression of NRHK1 can be designed based on the NRHK1 gene, its variants, or the agents/drugs that affect the expression of the NRHK1 gene or the activity of the NRHK1 gene products.

[0117] The following subsections illustrate examples of the utilities of human NRHK1 gene and NRHK1 kinase. Various changes and modifications within the spirit and scope of the present invention will become apparent to those skilled in the art from the present description.

[0118] Polynucleotides and Variants Thereof

[0119] One aspect of the invention pertains to isolated polynucleotide probes capable of hybridizing to the NRHK1 gene or its transcripts, such as NRHK1 mRNAs. These probes can be used to detect the expression level of the NRHK1 gene in human tissue or cells. The present invention also contemplates polynucleotide fragments for use as PCR primers for the amplification or mutation of the NRHK1 gene or the NRHK1 kinase-coding sequences. Another aspect of the invention pertains to isolated polynucleotides that encode NRHK1, or a fragment or mutant thereof. These polynucleotides can be used for expressing NRHK1, or a fragment or mutant thereof. The protein products thus expressed can be used to screen for agents/drugs that modulate an activity of NRHK1. In addition, these polynucleotides can be used to designing gene therapy vectors which target the expression of the NRHK1 gene or an activity of NRHK1 in humans.

[0120] A polynucleotide comprising SEQ ID NO:1 or SEQ ID NO:3 can be prepared using standard molecular biology techniques as appreciated by one of ordinary skill in the art. For instance, primers derived from the 5′ and 3′ ends of SEQ ID NO:1 can be used to amplify mRNAs isolated from human tissues. The cDNA thus produced contains SEQ ID NO:1. Likewise, primers for amplifying the human genomic sequence containing SEQ ID NO:3 can be designed and used to prepare the genomic sequence of the NRHK1 gene. A variant (such as a homolog) or a fragment of SEQ ID NO:1 or SEQ ID NO:3 can be similarly prepared. Alternatively, probes can be designed to screen for cDNA or genomic sequence libraries in order to identify polynucleotide molecules comprising the full-length or fragments of SEQ ID NO:1 or SEQ ID NO:3. The molecules thus identified can be used to create suitable vectors comprising the full-length SEQ ID NO:1 or SEQ ID NO:3.

[0121] Polynucleotides capable of hybridizing to the NRHK1 gene can be prepared by standard synthetic techniques, e.g., using an automated DNA synthesizer. Preferably, the polynucleotide probes can hybridize to the NRHK1 gene under reduced stringent conditions, stringent conditions, or highly stringent conditions. In one embodiment, the polynucleotides comprise at least 15, 20, 25, 30, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 or more consecutive nucleotides of SEQ ID NO:1. Any fragments of SEQ ID NO:1 and SEQ ID NO:3 may be used as hybridization probes or PCR primers for the NRHK1 gene or its transcripts. The probes/primers can be substantially purified.

[0122] In a preferred embodiment, the hybridization probes for the NRHK1 gene comprise a label group. The label group can be a radioisotope, a fluorescent compound, an enzyme, or an enzyme co-factor. Probes thus labeled can be used as part of a diagnostic kit for determining the expression level of the NRHK1 gene in human tissues.

[0123] This invention encompasses human NRHK1 gene homologs in other species. These homologs can be determined by search different sequence databases, such as the Entrez/GenBank sequence databases maintained by the NCBI. The invention also encompasses polynucleotide molecules which are structurally different from the molecules described above, but have the substantially same properties as the molecules described above. Such molecules include allelic variants, which will be described below in greater detail.

[0124] DNA sequence polymorphism in human NRHK1 gene exists among different individuals due to natural allelic variations. An allele is one of a group of genes which occur alternatively at a given genetic locus. DNA polymorphisms that affect the RNA expression level of the NRHK1 gene can also exist, e.g., through affecting the regulation or degradation of expression of the gene. The present invention contemplates all allelic variants of human NRHK1 gene. Allelic variants and other homologs of the NRHK1 gene can be isolated using probes/primers derived from SEQ ID NO:1 or SEQ ID NO:3.

[0125] It should, of course, be understood that SEQ ID NO:1 and SEQ ID NO:3 can be modified. The modified polynucleotides can comprise one or more mutations. These mutations can be substitutions, additions or deletions of 1, 2, 3, 5, 10, 15, 20 or more nucleotide residues in SEQ ID NO:1 or SEQ ID NO:3. Standard techniques can be used, such as site-directed mutagenesis or PCR-mediated mutagenesis. Preferably, these mutations create conservative amino acid substitutions. Alternatively, mutations can be introduced randomly along all or part of the NRHK1 gene or its cDNA, such as by saturation mutagenesis. Following mutagenesis, the encoded proteins can be expressed recombinantly and their activities can be determined.

[0126] In one embodiment, nucleotide substitutions leading to amino acid substitutions at “non-essential” amino acid residues can be introduced. A “non-essential” amino acid residue is a residue that can be altered without changing the biological activity of the protein. In contrast, an “essential” amino acid residue is required for the biological activity of the protein. Amino acid residues that are conserved among allelic variants or homologs of the NRHK1 gene from different species preferably are not changed in the present invention.

[0127] Accordingly, another aspect of the invention pertains to NRHK1 proteins that contain changes in amino acid residues that are not essential for the biological activity of NRHK1. These proteins differ in amino acid sequence from the original human NRHK1 kinase, but retain its biological activity. In one embodiment, the modified protein comprises an amino acid sequence at least about 85%, 90%, 95%, 98% or more homologous to SEQ ID NO:2.

[0128] In another embodiment, NRHK1 proteins contain mutations in amino acid residues which result in inhibition of NRHK1 activity. These mutated NRHK1 proteins can be used to inhibit NRHK1 activity in patients with disorders related to the aberrant expression of NRHK1.

[0129] A polynucleotide of this invention can be further modified to increase its stability in vivo. Possible modifications include, but are not limited to, the addition of flanking sequences at the 5′ and/or 3′ ends; the use of phosphorothioate or 2-o-methyl rather than phosphodiester linkages in the backbone; and/or the inclusion of nontraditional bases such as inosine, queosine and wybutosine, as well as acetyl- methyl-, thio- and other modified forms of adenine, cytidine, guanine, thymine and uridine.

[0130] Polynucleotide molecules which are antisense to the NRHK1 gene can be prepared. An “antisense” polynucleotide comprises a nucleotide sequence which is complementary to a “sense” polynucleotide which encodes a protein. An antisense polynucleotide can bind via hydrogen bonds to the sense polynucleotide.

[0131] Antisense polynucleotides of the invention can be designed according to the rules of Watson and Crick base pairing. The antisense polynucleotide molecule can be complementary to the entire coding region or part of the coding region of the NRHK1 gene. The antisense polynucleotide molecule can also be complementary to a “noncoding region” in the coding strand of the NRHK1 gene. Preferably, the antisense polynucleotide is an oligonucleotide which is antisense to only a portion of the NRHK1 gene. An antisense polynucleotide can be, for example, about 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 nucleotides in length. An antisense polynucleotide of the invention can be constructed using chemical synthesis and enzymatic ligation reactions as appreciated by one of ordinary skill in the art. For example, an antisense polynucleotide can be chemically synthesized using naturally occurring nucleotides or variously modified nucleotides designed to increase the biological stability of the molecules or to increase the physical stability of the duplex formed between the antisense and sense polynucleotides. Examples of modified nucleotides which can be used to generate the antisense polynucleotide include 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xantine, 4-acetylcytosine, 5-(carboxyhydroxymethyl) uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine, 5,-methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-N6-isopentenyl adenosine, unacil-5-oxyacetic acid, wybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid methylester, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diaminopurine. Phosphorothioate derivatives and acridine substituted nucleotides can also be used. Alternatively, the antisense polynucleotide can be produced biologically using an expression vector into which a polynucleotide has been subcloned in an antisense orientation (i.e., RNA transcribed from the inserted polynucleotide will be of an antisense orientation to the target polynucleotide of interest).

[0132] The antisense polynucleotides of the invention can be administered to a subject or applied in situ such that they hybridize or bind to cellular mRNAs and/or genomic DNAs that encode NRHK1 kinase, thereby inhibiting the expression of NRHK1 kinase. The hybridization can result in a stable duplex via conventional nucleotide complementarity. An example route for administering antisense polynucleotides includes direct injection at a tissue site. Antisense polynucleotides can also be modified first, and then administered systemically. For example, for systemic administration, antisense molecules can be modified such that they specifically bind to receptors or antigens expressed on a selected cell surface. Suitable modifications include linking the antisense polynucleotides to peptides or antibodies which bind to the cell surface receptors or antigens. In addition, the antisense polynucleotides can be delivered to cells using vectors. To achieve sufficient intracellular concentrations of the antisense molecules, strong pol II or pol III promoters may be used in the vectors.

[0133] In one embodiment, the antisense polynucleotides are a-anomeric polynucleotides. An a-anomeric polynucleotide molecule forms specific double-stranded hybrid with a complementary RNA in which, contrary to the usual β-units, the strands run parallel to each other. The antisense polynucleotide molecule can also comprise a 2-o-methylribonucleotide or a chimeric RNA-DNA analogue.

[0134] In another embodiment, the antisense polynucleotide is a ribozyme. Ribozymes are catalytic RNA molecules with ribonuclease activity which are capable of cleaving a single-stranded polynucleotide, such as an mRNA, to which they have a complementary region. Thus, ribozymes (e.g., hammerhead ribozymes described in Haselhoif and Gerlach Nature 334:585-591, 1988) can be used to catalytically cleave mRNA transcripts of NRHK1 in order to inhibit its expression. A ribozyme having specificity for the NRHK1 gene or its transcripts can be designed based upon SEQ ID NO:1 or 3. mRNAs transcribed from the NRHK1 gene can be used to select from a pool of RNA molecules a catalytic RNA having a specific ribonuclease activity.

[0135] Alternatively, the expression of the NRHK1 gene can be inhibited by using nucleotide sequences complementary to the regulatory region (e.g., the promoter and/or enhancers). These nucleotide sequences can form triple helical structures that prevent transcription of the gene in the target cells.

[0136] Expression of the NRHK1 gene can also be inhibited using RNA interference (“RNAi”). RNAi is a phenomenon in which the introduction of double-stranded RNA (dsRNA) into certain organisms or cell types causes degradation of the homologous mRNA. First discovered in the nematode Caenorhabditis elegans, RNAi has since been found to operate in a wide range of organisms. For example, in mammalian cells, introduction of long dsRNA (>30 nt) can initiate a potent antiviral response, exemplified by nonspecific inhibition of protein synthesis and RNA degradation. RNA interference provides a mechanism of gene silencing at the mRNA level. In recent years, RNAi has become an endogenous and potent gene-specific silencing technique that uses double-stranded RNAs (dsRNA) to mark a particular transcript for degradation in vivo. It also offers an efficient and broadly applicable approach for gene knock-out. In addition, RNAi technology can be used for therapeutic purposes. For example, RNAi targeting Fas-mediated apoptosis has been shown to protect mice from fulminant hepatitis. RNAi technology has been disclosed in numerous publications, such as U.S. Pat. Nos. 5,919,619, 6,506,559 and PCT Publication Nos. WO99/14346, WO01/70949, WO01/36646, WO00/63364, WO00/44895, WO01/75164, WO01/92513, WO01/68836 and WO01/29058.

[0137] In a preferred embodiment, short interfering RNAs (siRNA) are used. siRNAs are dsRNAs having 19-25 nucleotides. siRNAs can be produced endogenously by degradation of longer dsRNA molecules by an RNase III-related nuclease called Dicer. siRNAs can also be introduced into a cell exogenously or by transcription of an expression construct. Once formed, the siRNAs assemble with protein components into endoribonuclease-containing complexes known as RNA-induced silencing complexes (RISCs). An ATP-generated unwinding of the siRNA activates the RISCs, which in turn target the complementary mRNA transcript by Watson-Crick base-pairing, thereby cleaving and destroying the mRNA. Cleavage of the mRNA takes place near the middle of the region bound by the siRNA strand. This sequence specific mRNA degradation results in gene silencing.

[0138] At least two ways can be employed to achieve siRNA-mediated gene silencing. First, siRNAs can be synthesized in vitro and introduced into cells to transiently suppress gene expression. Synthetic siRNA provides an easy and efficient way to achieve RNAi. siRNA are duplexes of short mixed oligonucleotides which can include, for example, 19 nucleotides with symmetric 2 dinucleotide 3′ overhangs. Using synthetic 21 bp siRNA duplexes (19 RNA bases followed by a UU or dTdT 3′ overhang), sequence specific gene silencing can be achieved in mammalian cells. These siRNAs can specifically suppress targeted gene translation in mammalian cells without activation of DNA-dependent protein kinase (PKR) by longer dsRNA, which may result in non-specific repression of translation of many proteins.

[0139] Second, siRNAs can be expressed in vivo from vectors. This approach can be used to stably express siRNAs in cells or transgenic animals. In one embodiment, siRNA expression vectors are engineered to drive siRNA transcription from polymerase III (pol III) transcription units. Pol III transcription units are suitable for hairpin siRNA expression, since they deploy a short AT rich transcription termination site that leads to the addition of 2 bp overhangs (UU) to hairpin siRNAs—a feature that is helpful for siRNA function. The Pol III expression vectors can also be used to create transgenic mice that express siRNA.

[0140] In another embodiment, siRNAs can be expressed in a tissue-specific manner.

[0141] Under this approach, long double-stranded RNAs (dsRNAs) are first expressed from a promoter (such as CMV (pol II)) in the nuclei of selected cell lines or transgenic mice. The long dsRNAs are processed into siRNAs in the nuclei (e.g., by Dicer). The siRNAs exit from the nuclei and mediate gene-specific silencing. A similar approach can be used in conjunction with tissue-specific (pol II) promoters to create tissue-specific knockdown mice.

[0142] Any 3′ dinucleotide overhang, such as UU, can be used for siRNAs. In some cases, G residues in the overhang may be avoided because of the potential for the siRNA to be cleaved by RNase at single-stranded G residues.

[0143] With regard to the siRNA sequence itself, it has been found that siRNAs with 30-50% GC content can be more active than those with a higher G/C content in certain cases. Moreover, since a 4-6 nucleotide poly(T) tract may act as a termination signal for RNA pol III, stretches of >4 Ts or As in the target sequence may be avoided in certain cases when designing sequences to be expressed from an RNA pol III promoter. In addition, some regions of mRNA may be either highly structured or bound by regulatory proteins. Thus, it may be helpful to select siRNA target sites at different positions along the length of the gene sequence. Finally, the potential target sites can be compared to the appropriate genome database (human, mouse, rat, etc.). Any target sequences with more than 16-17 contiguous base pairs of homology to other coding sequences may be eliminated from consideration in certain cases.

[0144] In one embodiment, siRNA can be designed to have two inverted repeats separated by a short spacer sequence and end with a string of Ts that serve as a transcription termination site. This design produces an RNA transcript that is predicted to fold into a short hairpin siRNA. The selection of siRNA target sequence, the length of the inverted repeats that encode the stem of a putative hairpin, the order of the inverted repeats, the length and composition of the spacer sequence that encodes the loop of the hairpin, and the presence or absence of 5′-overhangs, can vary to achieve desirable results.

[0145] The siRNA targets can be selected by scanning an mRNA sequence for AA dinucleotides and recording the 19 nucleotides immediately downstream of the AA. Other methods can also been used to select the siRNA targets. In one example, the selection of the siRNA target sequence is purely empirically determined (see e.g., Sui et al., Proc. Natl. Acad. Sci. USA 99: 5515-5520, 2002), as long as the target sequence starts with GG and does not share significant sequence homology with other genes as analyzed by BLAST search. In another example, a more elaborate method is employed to select the siRNA target sequences. This procedure exploits an observation that any accessible site in endogenous mRNA can be targeted for degradation by synthetic oligodeoxyribonucleotide/RNase H method (Lee et al., Nature Biotechnology 20:500-505, 2002).

[0146] In another embodiment, the hairpin siRNA expression cassette is constructed to contain the sense strand of the target, followed by a short spacer, the antisense strand of the target, and 5-6 Ts as transcription terminator. The order of the sense and antisense strands within the siRNA expression constructs can be altered without affecting the gene silencing activities of the hairpin siRNA. In certain instances, the reversal of the order may cause partial reduction in gene silencing activities.

[0147] The length of nucleotide sequence being used as the stem of siRNA expression cassette can range, for instance, from 19 to 29. The loop size can range from 3 to 23 nucleotides. Other lengths and/or loop sizes can also be used.

[0148] In yet another embodiment, a 5′ overhang in the hairpin siRNA construct can be used, provided that the hairpin siRNA is functional in gene silencing. In one specific example, the 5′ overhang includes about 6 nucleotide residues.

[0149] In a preferred embodiment, the target sequence for RNAi is a 21-mer sequence fragment selected from SEQ ID NO:1. The 5′ end of the target sequence has dinucleotide “NA,” where “N” can be any base and “A” represents adenine. The remaining 1 9-mer sequence has a GC content of between 45% and 55%. In addition, the remaining 19-mer sequence does not include (1) any three consecutive identical bases (i.e., GGG, CCC, TTT, or AAA); (2) seven “GC” in a role; and (3) any palindrome sequence with 5 or more bases. Furthermore, the target sequence has low sequence homology to other human genes. In one specific example, potential target sequences are searched by BLASTN against NCBI's human UniGene cluster sequence database. The human UniGene database contains nonredundant sets of gene-oriented clusters. Each UniGene cluster includes sequences that represent a unique gene. Fragments of SEQ ID NO:1 that produce no hit to other human genes under BLASTN search are selected as the preferred candidate sequences for RNAi. During the search, the e-value may be set at a stringent value (such as “1”). Table 4 lists exemplary NRHK1 gene target sequences for RNAi prepared using the above-described criteria. The siRNA sequences for each target sequence (the sense strand and the antisense strand) are also disclosed. In addition, the 5′ end location of each target sequence in SEQ ID NO:1 is identified (“5 End”). TABLE 4 Exemplary RNAi Target Sequences of the NRHK1 Gene and the Corresponding siRNAs Target Sequence siRNA Sense Strand siRNA Antisense Strand (SEQ ID NO) 5′ End (SEQ ID NO) (SEQ ID NO) AATGGAATATTCGTGCGGAGG 559 UGGAAUAUUCGUGCGGAGGUU UUACCUUAUAAGCACGCCUC  (SEQ ID NO:7)  (SEQ ID NO:8)  (SEQ ID NO:9)  AATATTCGTGCGGAGGAAGAC 564 UAUUCGUGCGGAGGAAGACUU UUAUAAGCACGCCUCCUUCUG (SEQ ID NO:10) (SEQ ID NO:11) (SEQ ID NO:12) AAGTTCCTGATGATCCTGCCA 1362 GUUCCUGAUGAUCCUGCCAUU UUAUAAGCACGCCUCCUUCUG (SEQ ID NO:13) (SEQ ID NO:14) (SEQ ID NO:15) CATCACCTTCTTGAGAGGCTC 878 UCACCUUCUUGAGAGGCUCUU UUAGUGGAAGAACUCUCCGAG (SEQ ID NO:16) (SEQ ID NO:17) (SEQ ID NO:18) CATCACCTTCTTGAGAGGCTC 1361 AGUUCCUGAUGAUCCUGCCUU UUUCAAGGACUACUAGGACGG (SEQ ID NO:19) (SEQ ID NO:20) (SEQ ID NO:21) GATGACCATTACGCCAGTCAG 186 UGACCAUUACGCCAGUCAGUU UUACUGGUAAUGCGGUCAGUC (SEQ ID NO:22) (SEQ ID NO:23) (SEQ ID NO:24) GAATATTCGTGCGGAGGAAGA 563 AUAUUCGUGCGGAGGAAGAUU UUUAUAAGCACGCCUCCUUCU (SEQ ID NO:25) (SEQ ID NO:26) (SEQ ID NO:27) GAAGGCCGTCCTGAAGACAAT 755 AGGCCGUCCUGAAGACAAUUU UUUCCGGCAGGACUUCUGUUA (SEQ ID NO:28) (SEQ ID NO:29) (SEQ ID NO:30)

[0150] In yet another embodiment, the polynucleotides of the present invention can be modified at the base moiety, sugar moiety or phosphate backbone to improve the stability, hybridization, or solubility of the molecules. For instance, the deoxyribose phosphate backbone of the polynucleotide molecules can be modified to generate peptide polynucleotides (see Hyrup B. et al. Bioorganic & Medicinal Chemistry 4:523, 1996). As used herein, the terms “peptide polynucleotides” or “PNAs” refer to polynucleotide mimics, e.g., DNA mimics, in which the deoxyribose phosphate backbone is replaced by a pseudopeptide backbone and only the four natural nucleobases are retained. The neutral backbone of PNAs has been shown to allow for specific hybridization to DNA and RNA under conditions of low ionic strength. PNA oligomers can be synthesized using standard solid phase peptide synthesis protocols.

[0151] PNAs can be used in therapeutic and diagnostic applications. For example, PNAs can be used as antisense agents for sequence-specific modulation of the NRHK1 gene expression. PNAs can also be used in the analysis of single base pair mutations in a gene, (e.g., by PNA-directed PCR clamping); as artificial restriction enzymes when used in combination with other enzymes, (e.g., S1 nucleases); or as probes or primers for DNA sequencing or hybridization.

[0152] In one embodiment, PNAs can be modified to enhance their stability or cellular uptake by attaching lipophilic or other helper groups to PNA, by the formation of PNADNA chimeras, or by the use of liposomes or other drug delivery techniques known in the art. For example, PNA-DNA chimeras of the polynucleotides of the invention can be generated. These chimeras allow DNA recognition enzymes, such as RNase H and DNA polymerases, to interact with the DNA portion while the PNA portion provides high binding affinity and specificity. PNA-DNA chimeras can be linked using linkers of appropriate lengths which are selected based on base stacking, number of bonds between the nucleobases, and orientations. The PNA-DNA chimeras can be synthesized as follows. A DNA chain is synthesized on a solid support using standard phosphoramidite coupling chemistry and modified nucleoside analogs. PNA monomers are then coupled in a stepwise manner to produce a chimeric molecule with a 5′ PNA segment and a 3′ DNA segment. Alternatively, chimeric molecules can be synthesized with a 5′ DNA segment and a 3′ PNA segment.

[0153] In other embodiments, the polynucleotides of this invention may include other appended groups such as peptides (e.g., for targeting host cell receptors in vivo), or agents facilitating transportation across the cell membrane or the blood-kidney barrier (see, e.g., PCT Publication No. WO89/10134). In addition, polynucleotides can be modified using hybridization-triggered cleavage agents or intercalating agents. To this end, the polynucleotides can be conjugated to another molecule (e.g., a peptide, hybridization triggered cross-linking agent, transport agent, or hybridization-triggered cleavage agent). Furthermore, the polynucleotide can be detectably labeled.

[0154] Polypeptides and Variants Thereof

[0155] Several aspects of the invention pertain to isolated NRHK1 polypeptides and mutated NRHK1 polypeptides capable of inhibiting normal NRHK1 activity. The present invention also contemplates immunogenic polypeptide fragments suitable for raising anti-NRHK1 antibodies.

[0156] In one embodiment, native NRHK1 polypeptides can be isolated from cells or tissue sources by using standard protein purification techniques. Standard purification methods include electrophoresis, molecular, immunological and chromatographic techniques. Specific examples include ion exchange, hydrophobic, affinity or reverse-phase HPLC chromatography, and chromatofocusing. In one embodiment, NRHK1 polypeptides are purified using a standard affinity column coupled with anti-NRHK1 antibodies. Ultrafiltration and diafiltration techniques can also be used. The degree of purification depends on the purpose of the use of the NRHK1 polypeptides. In some instances, purification is not necessary.

[0157] In another embodiment, NRHK1 polypeptides or mutated NRHK1 polypeptides capable of inhibiting normal NRHK1 activity are produced by recombinant DNA techniques. Alternative to recombinant expression, NRHK1 polypeptides or mutated NRHK1 polypeptides can be synthesized chemically using standard peptide synthesis techniques.

[0158] The invention provides NRHK1 polypeptides encoded by the human NRHK1 gene, or homologs thereof. The polypeptides of this invention can be substantially homologous to human NRHK1 kinase (SEQ ID NO:2). Preferably, these polypeptides retain the biological activity of the native NRHK1 kinase. In one embodiment, the polypeptides comprise an amino acid sequence which is at least about 85%, 90%, 95%, 98% or more homologous to SEQ ID NO:2.

[0159] Comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. The percent identity between two amino acid sequences can be determined using the Needleman and Wunsch (J. Mol. Biol. 48:444-453, 1970) algorithm, or the GAP program in the GCG software package which uses either a Blossom 62 matrix or a PAM250 matrix, and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6. The percent identity between two nucleotide sequences can be determined using the GAP program in the GCG software package, which uses a NWSgapdna.CMP matrix and a gap weight of 40, 50, 60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, or 6. The percent identity between two amino acid or nucleotide sequences can also be determined using the algorithm of E. Meyers and W. Miller (CABIOS, 4:11-17, 1989) which has been incorporated into the ALIGN program (version 2.0), or the pairwise BLAST program available at NCBI's BLAST web site.

[0160] The polypeptide and polynucleotide sequences of the present invention can be used as query sequences for searching public databases in order to identify similar sequences. The search can be conducted using BLAST programs, such as the protein BLAST, nucleotide BLAST, pairwise BLAST, and genomic BLAST, that are available at the BLAST web site maintained by the NCBI. When using BLAST programs, the default parameters of the respective programs can also be used.

[0161] The invention further provides chimeric or fusion NRHK1 polypeptides. A fusion NRHK1 polypeptide contains an NRHK1-related polypeptide and a non-NRHK1 polypeptide. The NRHK1 -related polypeptides include all or a portion of SEQ ID NO:2 or its variant. A peptide linker sequence can be employed to separate the NRHK1-related polypeptide from the non-NRHK1 polypeptide components by a distance sufficient to ensure that each polypeptide folds into its native secondary and tertiary structures. Such a peptide linker sequence is incorporated into the fusion protein using standard techniques well known in the art. Suitable peptide linker sequences can be chosen based on the following factors: (1) their ability to adopt a flexible extended conformation; (2) their inability to adopt a secondary structure that could interact with functional epitopes on the NRHK1 -related polypeptide and non-NRHK1 polypeptide; and (3) the lack of hydrophobic or charged residues that might react with the polypeptide functional epitopes. Preferred peptide linker sequences contain Gly, Asn and Ser residues. Other near neutral amino acids, such as Thr and Ala can also be used in the linker sequence. Amino acid sequences suitable as linkers include those disclosed in Maratea et al., Gene 40:39-46, 1985; Murphy et al., Proc. Natl. Acad. Sci. USA 83:8258-8262, 1986; U.S. Pat. Nos. 4,935,233 and 4,751,180. The linker sequences may be from 1 to about 50 amino acids in length. Linker sequences are not required when the NRHK1 -related polypeptide or the non-NRHK1 polypeptide has non-essential N-terminal amino acid regions that can be used to separate the respective functional domains and thereby prevent steric interference.

[0162] In one embodiment, the fusion protein is a GST-NRHK1 fusion protein in which an NRHK1 -related sequence, such as SEQ ID NO:2, is fused to the C-terminus of the GST sequence. This fusion protein can facilitate the purification of the recombinant NRHK1.

[0163] The NRHK1-fusion proteins of the invention can be incorporated into pharmaceutical compositions and administered to a subject. The NRHK1-fusion proteins can be used to affect the bioavailability of an NRHK1 substrate. The NRHK1-fusion proteins can also be used for the treatment or prevention of damages caused by (i) aberrant modification or mutation of NRHK1, or (ii) aberrant post-translational modification of NRHK1. It is also conceivable that a fusion protein containing a normal or mutated NRHK1 polypeptide, or a fragment thereof, can be used to inhibit NRHK1 activity in a human subject.

[0164] Moreover, the NRHK1-fusion proteins can be used as immunogens to produce anti-NRHK1 antibodies. They can also be used to purify NRHK1 ligands and to screen for molecules capable of inhibiting the interaction between NRHK1 and its substrates.

[0165] Preferably, the NRHK1-chimeric or fusion proteins of the invention are produced using standard recombinant DNA techniques. Commercially available expression vectors which encode a fusion moiety (e.g., a GST polypeptide) can be used.

[0166] A signal sequence can be used to facilitate secretion and isolation of the secreted protein or other proteins of interest. Signal sequences are typically characterized by a core of hydrophobic amino acids which are generally cleaved from the mature protein. Such signal peptides contain processing sites that allow cleavage of the signal sequence from the mature proteins as they pass through the secretory pathway. The present invention encompasses NRHK1 polypeptides having a signal sequence, or the polynucleotide sequences encoding the same.

[0167] The present invention also pertains to NRHK1 mutants which function as antagonists to NRHK1. In one embodiment, antagonists of NRHK1 are used as therapeutic agents. For example, a mutant of NRHK1 that forms a non-functional dimer with a wide-type NRHK1 (the so-called dominant negative mutant) can decrease the activity of NRHK1 and may ameliorate diseases in a subject wherein NRHK1 are abnormally increased in level or activity. Dominant negative NRHK1 mutants can be generated by mutagenesis, as appreciated by one skilled in the art.

[0168] NRHK1 mutants which function as either NRHK1 agonists or antagonists can be identified by screening combinatorial libraries of mutants. A variegated library of NRHK1 mutants can be produced by, for example, enzymatically ligating a mixture of synthetic oligonucleotides into gene sequences such that a degenerate set of potential NRHK1 sequences is expressible as individual polypeptides, or alternatively, as a set of larger fusion proteins containing the set of NRHK1 sequences therein. There are a variety of methods which can be used to produce libraries of potential NRHK1 mutants from a degenerate oligonucleotide sequence. A degenerate gene sequence can be chemically synthesized using an automatic DNA synthesizer. The synthetic gene can then be ligated into an appropriate expression vector.

[0169] In one embodiment, a library of coding sequences can be generated using nucleases. For instance, double stranded PCR fragments of the NRHK1 coding sequence can be treated by a nuclease which produces about one nick per molecule. The doublestranded DNAs then are subject to a cycle of denaturing and re-naturing. The newly reformed DNAs, which may include sense/antisense pairs from different nicked products, are treated with SI nuclease to remove single stranded portions. Using this method, an expression library which encodes N-terminal, C-terminal or internal fragments of NRHK1 can be derived.

[0170] In addition, recursive ensemble mutagenesis (REM), a technique which enhances the frequency of functional mutants in the libraries, can be used to prepare NRHK1 mutants (Delgrave et al., Protein Engineering 6:327-331, 1993).

[0171] NRHK1 fragments, or variants thereof, can also be generated using synthetic means, such as solid-phase synthesis methods. Preferably, the synthesized fragment has less than about 100 amino acids, or preferably, less than about 50 amino acids.

[0172] Antibodies

[0173] In accordance with another aspect of the present invention, antibodies specific to NRHK1 or its variants are prepared. An antibody is considered to bind “specifically” to an antigen if the binding affinity between the antibody and the antigen is equal to, or greater than 10⁵ M⁻¹. The antibodies can be monoclonal or polyclonal. Preferably, the antibodies are monoclonal. More preferably, the antibodies are humanized antibodies.

[0174] Polyclonal anti-NRHK antibodies can be prepared by immunizing a suitable subject with NRHK1 or fragments thereof. The anti-NRHK1 antibody titer in the immunized subject can be monitored over the time using standard techniques, such as ELISA. The anti-NRHK1 antibody can be isolated from the immunized subject using well known techniques.

[0175] In one embodiment, hybridomas capable of producing anti-NRHK1 antibodies are prepared. Purified NRHK1 or its variants, or fragments thereof, are used to immunize a vertebrate, such as a mammal. Suitable mammals include mice, rabbits and sheep. Preferably, the fragment used for immunization comprises at least 8 amino acid residues, more preferably at least 12 amino acid residues, highly preferably at least 16 amino acid residues, and most preferably at least 20 amino acid residues.

[0176] Immunogenic fragments (epitopes) of NRHK1 can be identified using well known techniques. In general, any fragment of SEQ ID NO:2 can be used to raise antibodies specific to NRHK1. Preferred epitopes are regions that are located on the surface of NRHK1. These regions are usually hydrophilic.

[0177] Splenocytes are isolated from the immunized vertebrate and fused with an immortalized cell line (such as a myeloma) to form hybridomas. Preferably, the immortal cell line is derived from the same mammalian species as the lymphocytes. For example, murine hybridomas can be made by fusing an immortalized mouse cell line with lymphocytes isolated from a mouse that is immunized with an immunogenic preparation of the present invention. Preferred immortalized cell lines include mouse myeloma cell lines that are sensitive to culture medium containing hypoxanthine, aminopterin and thymidine (“HAT medium”). Suitable myeloma cell lines include, but are not limited to, the P3-NS1/1-Ag4-1, P3-x63-Ag8.653 or Sp210-Agl4 myeloma lines, all of which are available from ATCC. In one embodiment, HAT-sensitive mouse myeloma cells are fused to mouse splenocytes using polyethylene glycol (“PEG”). Hybridoma cells thus produced are selected against HAT medium, which kills unfused or unproductively fused myeloma cells. Hybridoma cells which produce monoclonal anti-NRHK1 antibodies are then detected by screening the hybridoma culture supernatants.

[0178] A monoclonal anti-NRHK1 antibody can also be prepared by screening a recombinant combinatorial immunoglobulin library (e.g., an antibody phase display library). Kits for generating and screening phage display libraries are commercially available (e.g., the Pharmacia Recombinant Phage Antibody System, Catalog No. 27-9400-01; and the Stratagene SurfZAP™ Phage Display Kit, Catalog No. 240612).

[0179] The anti-NRHK1 antibodies of the present invention also include “single-chain Fv” or “scFv.” The scFv fragments comprise the V_(H) and V_(L) domains of an antibody. Generally, the scFv polypeptide further comprises a polypeptide linker between the V_(H) and V_(L) domains. The polypeptide linker enables the scFv to form the desired structure for antigen binding. Additionally, recombinant anti-NRHK1 antibodies, such as chimeric and humanized monoclonal antibodies, can be prepared, as appreciated by one of ordinary skill in the art.

[0180] Humanized antibodies are particularly desirable for therapeutic treatment of human subjects. Humanized forms of non-human (e.g., murine) antibodies are chimeric immunoglobulins, immunoglobulin chains, or fragments thereof (such as Fv, Fab, Fab′, F(ab′)₂ or other antigen-binding subsequences of antibodies) which contain minimal sequence derived from non-human immunoglobulin. Humanized antibodies are derived from human immunoglobulins in which the residues forming the complementary determining regions (CDRs) are replaced by the residues from CDRs of a non-human antibody, such as a mouse, rat or rabbit antibody having the desired specificity, affinity and capacity. In some instances, Fv framework residues of the human immunoglobulin are replaced by corresponding non-human residues. Humanized antibodies may also comprise residues which are found neither in the recipient antibody nor in the imported CDR or framework sequences. The humanized antibody can comprise at least one or two variable domains, in which all or substantially all of the CDR regions correspond to those of a non-human immunoglobulin and all or substantially all of the constant regions are those of a human immunoglobulin consensus sequence. The humanized antibody preferably comprises at least a portion of an immunoglobulin constant region (Fc) of a human immunoglobulin.

[0181] Humanized antibodies can be produced using transgenic mice which are incapable of expressing endogenous immunoglobulin heavy and light chains but can express human heavy and light chains. The transgenic mice are immunized in the normal fashion with a selected antigen. Monoclonal antibodies directed against the antigen can be obtained using conventional hybridoma technology. The human immunoglobulin transgenes harbored in the transgenic mice rearrange during B cell differentiation, and subsequently undergo class switching and somatic mutation. Using this technique, therapeutically useful IgG, IgA and IgE antibodies can be prepared.

[0182] In addition, humanized antibodies which recognize a selected epitope can be generated using a technique referred to as “guided selection.” In this approach a selected non-human monoclonal antibody, e.g., a murine antibody, is used to guide the selection of a humanized antibody recognizing the same epitope.

[0183] In a preferred embodiment, the antibodies to NRHK1 are capable of reducing or eliminating the biological function of NRHK1. Preferably, the antibodies reduce at least 25% of NRHK1 activity. More preferably, the antibodies reduce at least about 50% of the activity. Highly preferably, the antibodies reduce about 95-100% of NRHK1 activity.

[0184] Anti-NRHK1 antibodies can be used to isolate NRHK1. Suitable methods include affinity chromatography and immunoprecipitation. Moreover, anti-NRHK1 antibodies can be used to evaluate the expression level of NRHK1. Anti-NRHK1 antibodies can also be used to monitor NRHK1 level as part of a clinical testing procedure, or to determine the efficacy of a given treatment regimen. Detection can be facilitated by coupling the antibody to a detectable substance. Examples of detectable substances include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, bioluminescent materials, and radioactive materials. Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, galactosidase, or acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin; examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes luminol; examples of bioluminescent materials include luciferase, luciferin, and aequorin; and examples of suitable radioactive materials include ¹²⁵I, ¹³¹I, ³⁵S or ³H.

[0185] Anti-NRHK1 antibodies are also useful for targeting a therapeutic agent/drug to a particular cell or tissue. The therapeutic agent/drug may be coupled to an antibody, either covalently or non-covalently. For instance, a therapeutic agent can be coupled to an antibody via a linker group. A linker group can function as a spacer to separate the antibody from the agent so as to avoid interference with antibody's binding capabilities. The linker group can also serve to increase the chemical reactivity of a substituent on the agent or the antibody, and thus increase the coupling efficiency. A variety of bifunctional or polyfunctional reagents, either homo- or hetero-functional (such as those described in the catalog of the Pierce Chemical Co., Rockford, Ill.), can be employed as the linker group. Coupling may be effected, for example, through amino groups, carboxyl groups, sulfhydryl groups or oxidized carbohydrate residues. There are numerous references describing this methodology. See e.g., U.S. Pat. No. 4,671,958.

[0186] Where a therapeutic agent is more potent when free from the antibody, it may be desirable to use a linker group which is cleavable during or upon internalization into the target cell. A number of different cleavable linker groups have been described. The mechanisms for the intracellular release of an agent from these linker groups include cleavage by reduction of a disulfide bond (e.g., U.S. Pat. No. 4,489,710), by irradiation of a photolabile bond (e.g., U.S. Pat. No. 4,625,014), by hydrolysis of derivatized amino acid side chains (e.g., U.S. Pat. No. 4,638,045), by serum complement-mediated hydrolysis (e.g., U.S. Pat. No. 4,671,958), or by acid-catalyzed hydrolysis (e.g., U.S. Pat. No. 4,569,789).

[0187] It may also be desirable to couple more than one agent to an antibody. In one embodiment, multiple agents are coupled to one antibody molecule. In another embodiment, at least two different types of agents are coupled to one antibody. Regardless of the particular embodiment, immunoconjugates coupled with more than one agent can be prepared in a variety of ways, as appreciated by one of ordinary skill in the art.

[0188] Vectors, Expression Vectors and Gene Delivery Vectors

[0189] Another aspect of the invention pertains to vectors containing a polynucleotide encoding NRHK1 or a portion thereof. One type of vector is a “plasmid,” which includes a circular double stranded DNA into which additional DNA segments can be introduced. Vectors also include expression vectors and gene delivery vectors.

[0190] The expression vectors of the present invention comprise a polynucleotide encoding NRHK1 or a portion thereof The expression vectors also include one or more regulatory sequences operably linked to the polynucleotide being expressed. These regulatory sequences are selected based on the type of host cells. It will be appreciated by those skilled in the art that the design of the expression vector depends on such factors as the choice of the host cells and the desired expression levels. NRHK1 can be expressed in bacterial cells such as E. coli, insect cells (using baculovirus expression vectors), yeast cells or mammalian cells. The expression vector can also be transcribed and translated in vitro, for example, by using T7 promoter regulatory sequences and T7 polymerase.

[0191] Expression of proteins in prokaryotes is most often carried out in E. coli with vectors containing constitutive or inducible promoters directing the expression of either fusion or non-fusion proteins. Fusion vectors add a number of amino acids to a protein encoded therein, usually to the amino terminus of the recombinant protein. Such fusion vectors typically serve three purposes: 1) to increase expression of the recombinant protein; 2) to increase the solubility of the recombinant protein; and 3) to aid in the purification of the recombinant protein by acting as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage site is introduced at the junction of the fusion moiety and the recombinant protein to enable separation of the recombinant protein from the fusion moiety subsequent to purification of the fusion protein. Suitable cleavage enzymes include Factor Xa, thrombin and enterokinase. Examples of fusion expression vectors include pGEX (Pharmacia Piscataway, N.J.), pMAL (New England Biolabs, Beverly, Mass.) and pRITS (Pharmacia, Piscataway, N.J.). Purified fusion proteins can be utilized in NRHK1 activity assays, or to generate antibodies specific for NRHK1.

[0192] Examples of suitable inducible non-fusion E. coli expression vectors include pTrc and pET 11d. Target gene expression from the pTrc vector relies on host RNA polymerase transcription from a hybrid trp-lac fusion promoter. Target gene expression from the pET 11d vector relies on transcription from a T7 gn10-lac fusion promoter mediated by a co-expressed viral RNA polymerase (T7 gn1). This viral polymerase is supplied by host strains BL21(DE3) or HSLE174(DE3) from a resident prophage harboring a T7 gn1 gene under the transcriptional control of the lacUV 5 promoter.

[0193] One strategy to maximize recombinant protein expression in E. coli is to express the protein in host bacteria that have an impaired capacity to proteolytically cleave the recombinant protein. Another strategy is to alter the polynucleotide sequence encoding the protein so that the individual codons for each amino acid are those preferentially utilized in E. coli.

[0194] In another embodiment, the NRHK1 expression vector is a yeast expression vector. Examples of yeast expression vectors include pYepSec1, pMFa, pJRY88, pYES2 (Invitrogen Corporation, San Diego, Calif.), and picZ (Invitrogen Corp, San Diego, Calif.).

[0195] Alternatively, NRHK1 or its variant can be expressed in insect cells using baculovirus expression vectors. Suitable baculovirus vectors include the pAc series and the pVL series.

[0196] In yet another embodiment, NRHK1 or its variant is expressed in mammalian cells using a mammalian expression vector. Examples of mammalian expression vectors include pCDM8 and pMT2PC. When used in mammalian cells, the expression vector's control functions are often provided by viral regulatory elements. For example, commonly used promoters are derived from polyoma, adenovirus 2, cytomegalovirus and Simian Virus 40.

[0197] In another embodiment, the mammalian expression vector contains tissue-specific regulatory elements. Examples of suitable tissue-specific promoters include the liver-specific albumin promoter, lymphoid-specific promoters, promoters of T cell receptors and immunoglobulins, neuron-specific promoters (e.g., the neurofilament promoter), pancreas-specific promoters, and mammary gland-specific promoters (e.g., milk whey promoter). Developmentally-regulated promoters are also contemplated, which include, for example, the α-fetoprotein promoter.

[0198] The present invention also provides a recombinant expression vector comprising a polynucleotide which encodes NRHK1 but is cloned into the expression vector in an antisense orientation. Regulatory sequences that are operatively linked to the antisense-oriented polynucleotide can be chosen to direct continuous expression of the antisense RNA molecule in a variety of cell types. Suitable regulatory sequences include viral promoters and/or enhancers. Regulatory sequences can also be chosen to direct constitutive, tissue specific or cell type specific expression of the antisense RNA. The antisense expression vector can be in the form of a recombinant plasmid, phagemid, or attenuated virus in which antisense polynucleotides are produced under the control of a highly efficient regulatory region.

[0199] The present invention further provides gene delivery vehicles for delivering polynucleotides to mammals. A polynucleotide sequence of the invention can be administered either locally or systemically via a gene delivery vehicle. Expression of the polynucleotide can be induced using endogenous mammalian or heterologous promoters. Expression of the polynucleotide in vivo can be either constituted or regulated. The gene delivery vehicles preferably are viral vectors, including retroviral, lentiviral, adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vectors. The viral vectors can also be astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, picornavirus, poxvirus, or togavirus vectors.

[0200] Delivery of gene therapy constructs is not limited to the above mentioned viral vectors. Other delivery methods can also be employed. These methods include nucleic acid expression vectors, polycationic condensed DNA linked or unlinked to killed adenovirus, ligand linked DNA, liposome-DNA conjugates, gene guns, ionizing radiation, nucleic charge neutralization, or fusion with cell membranes. Naked DNA can also be employed. Uptake efficiency of the naked DNA may be improved using biodegradable latex beads. This method can be further improved by treating the beads to increase their hydrophobicity.

[0201] Reoulatable Expression Systems

[0202] Another aspect of the present invention pertains to the use of regulatable expression systems to express desirable polynucleotides or polypeptides in cells. Systems suitable for this invention are briefly described below:

[0203] Tet-on/offsystem. The Tet-system is based on two regulatory elements derived from the tetracycline-resistance operon of the E. coli Tn10 transposon: the tet repressor protein (TetR) and the Tet operator DNA sequence (tetO) to which TetR binds (Gossen et al., Science 268: 1766-1769, 1995). The system consists of two components, a “regulator” and a “reporter” plasmid. The “regulator” plasmid encodes a hybrid protein containing a mutated Tet repressor (rtetR) fused to the VP16 activation domain of herpes simplex virus. The “reporter” plasmid contains a tet-responsive element (TRE), which controls the “reporter” gene of choice. The rtetR-VP16 fusion protein can only bind to the TRE, therefore activating the transcription of the “reporter” gene in the presence of tetracycline. The system has been incorporated into a number of viral vectors including retrovirus, adenovirus and AAV.

[0204] Ecdysone system. The Ecdysone system is based on the molting induction system found in Drosophila, but modified for inducible expression in mammalian cells. The system uses an analog of the Drosophila steroid hormone ecdysone, muristerone A, to activate expression of the gene of interest via a heterodimeric nuclear receptor. Expression levels have been reported to exceed 200-fold over basal levels with no effect on mammalian cell physiology (No et al., Proc. Natl. Acad. Sci. USA 93: 3346-3351, 1996).

[0205] Progesterone-system. The progesterone receptor is normally stimulated to bind to a specific DNA sequence and to activate transcription through an interaction with its hormone ligand. Conversely, the progesterone antagonist mifepristone (RU486) is able to block hormone-induced nuclear transport and subsequent DNA binding. A mutant form of the progesterone receptor that can be stimulated to bind through an interaction with RU486 has been generated. To generate a specific, regulatable transcription factor, the RU486-binding domain of the progesterone receptor has been fused to the DNA-binding domain of the yeast transcription factor GAL4 and the transactivation domain of the HSV protein VP16. The chimeric factor is inactive in the absence of RU486. The addition of hormone, however, induces a conformational change in the chimeric protein, and this change allows binding to a GAL4-binding site and the activation of transcription from promoters containing the GAL4-binding site (Wang et al., Nat. Biotech 15: 239-243, 1997).

[0206] Rapamycin-system. Immunosuppressive agents, such as FK506 and rapamycin, act by binding to specific cellular proteins and facilitating their dimerization. For example, the binding of rapamycin to FK506-binding protein (FKBP) results in its heterodimerization with another rapamycin binding protein FRAP, which can be reversed by removal of the drug. The ability to bring two proteins together by addition of a drug potentiates the regulation of a number of biological processes, including transcription. A chimeric DNA-binding domain has been fused to the FKBP, which enables binding of the fusion protein to a specific DNA-binding sequence. A transcriptional activation domain also has been fused to FRAP. When these two fusion proteins are co-expressed in the same cell, a fully functional transcription factor can be formed by heterodimerization mediated by addition of rapamycin. The dimerized chimeric transcription factor can then bind to a synthetic promoter sequence containing copies of the synthetic DNA-binding sequence. This system has been successfully integrated into adenoviral and AAV vectors. Long term regulatable gene expression has been achieved in both mice and baboons (Ye et al., Science 283: 88-91, 1999).

[0207] Detection Methods

[0208] In patients with disorders related to the aberrant expression of NRHK1. The expression level of NRHK1 can be used as an indicator for detecting the presence of NRHK1 -related diseases in humans. Detection and measurement of the relative amount of the NRHK1 gene product can be carried out using various methods known in the art.

[0209] Typical methodologies for detecting the transcription level of a gene include extracting RNA from a cell or tissue sample, hybridizing a labeled probe to the extracted RNA or derivative thereof (such as cDNA or cRNA), and detecting the probe. Suitable methods include Northern Blot and quantitative RCR or RT-PCR. In situ hybridization can also be used to detect the transcription level of the NRHK1 gene in human tissues.

[0210] Typical methodologies for detecting a polypeptide include extracting proteins from a cell or tissue sample, binding an antibody to the target polypeptide and detecting the antibody. Suitable methods include enzyme linked immunosorbent assays (ELISAs), Western blots, immunoprecipitations, and immunofluorescence. The antibody can be polyclonal, or preferably, monoclonal. The antibody can be an intact antibody, or a fragment thereof (e.g. Fab or F(ab′)₂ ). The antibody can be labeled with a radioisotope, a fluorescent compound, an enzyme, an enzyme co-factor, or a detectable ligand. The term “labeled,” with regard to a probe or antibody, is intended to encompass direct labeling such as through covalent coupling, as well as indirect labeling such as being mediated by another reagent which is directly labeled. Examples of indirect labeling include labeling a primary antibody using a fluorescently labeled secondary antibody, or attaching a DNA probe with a biotin which can be detected, for example, by a fluorescence-labeled streptavidin.

[0211] Preferably, the binding affinity of the antibody to NRHK1 is at least 105 M-1. More preferably, the binding affinity is at least 106 M-1. Other methods such as electrophoresis, chromatography or direct sequencing can also be used to detect the amount of a polypeptide in a biological sample. Anti-NRHK1 antibodies can also be directly introduced into a subject. The antibody can be labeled with a radioactive marker whose presence and location in the subject can be detected using standard imaging techniques.

[0212] In one embodiment, the genomic copies of the NRHK1 gene in the genome of a human subject may indicate the presence or predisposition of a disease. Detection of the presence or number of copies of the NRHK1 gene in the genome can be performed using methods known in the art. For instance, it can be assessed using Southern Blot. The probes for Southern Blot can be labeled with a radioisotope, a fluorescent compound, an enzyme, or an enzyme co-factor.

[0213] In the field of diagnostic assays, the above-described detection methods can be used to determine the severity of NRHK1-related diseases. A biological sample is isolated from a test subject, and the presence, quantity and/or activity of NRHK1 in the sample relative to a disease free or control sample is evaluated. The expression level of NRHK1 in the biological sample can indicate the presence or severity of NRHK1 -related diseases in the test subject. The term “biological sample” is intended to include tissues, cells or biological fluids isolated from the subject. A preferred biological sample is a serum sample isolated from the subject using conventional means.

[0214] Screening Methods

[0215] The present invention also provides methods for identifying NRHK1 modulators. Suitable modulators include compounds or agents comprising therapeutic moieties, such as peptides, peptidomimetics, peptoids, polynucleotides, small molecules or other drugs. These moieties can either bind to NRHK1, or have a modulatory (e.g., stimulatory or inhibitory) effect on the activity of NRHK1. In one embodiment, the moieties have a modulatory effect on the interactions of NRHK1 with one or more of its natural substrates. These moieties can also exert a modulatory effect on the expression of NRHK1. The screen assays of the present invention comprise detecting the interactions between NRHK1 and test components.

[0216] The test compounds of the present invention can be either small molecules or bioactive agents. In a preferred embodiment, the test compound is a small organic or inorganic molecule. In another preferred embodiment, the test compounds are polypeptides, oligopeptides, polysaccharides, nucleotides or polynucleotides.

[0217] In accordance with one aspect of this invention, methods for screening for compounds that inhibit the biological activities of NRHK1 are provided. Pharmaceutical compositions comprising these compounds can subsequently be prepared. The screening method comprises (1) contacting a sample with a compound, and (2) comparing expression profile or biological activity of NRHK1 in the sample to determine whether the compound substantially decreases the expression level or activities of NRHK1. The screening method can be carried out either in vivo or in vitro.

[0218] The present invention further includes a method for screening for compounds capable of modulating the binding between NRHK1 and a binding partner. As used herein, the term “binding partner” refers to a bioactive agent which serves as either a substrate for NRHK1, or a ligand having a binding affinity to NRHK1. The bioactive agent may be selected from a variety of naturally-occurring or synthetic compounds, proteins, peptides, polysaccharides, nucleotides or polynucleotides.

[0219] Inhibitors of the expression, activity or binding ability of NRHK1 may be used as therapeutic compositions. These inhibitors can be formulated in suitable pharmaceutical compositions, as described herein below.

[0220] The present invention also provides methods for conducting high-throughput screening for compounds capable of inhibiting activity or expression of NRHK1. In one embodiment, the high-throughput screening method involves contacting test compounds with NRHK1, and then detecting the effect of the test compounds on NRHK1. Functional assays, such as cytosensor microphysiometer-based assays, calcium flux assays (e.g., FLIPR®, Molecular Devices Corp, Sunnyvale, Calif.), or the TUNEL assay, can be employed to measure NRHK1 cellular activity. Fluorescence-based techniques can be used for high-throughput and ultra high-throughput screening. They include, but are not limited to, BRET® and FRET® (both by Packard Instrument Co., Meriden, Conn.).

[0221] In a preferred embodiment, the high-throughput screening assay uses label-free plasmon resonance technology as provided by BIACORE® systems (Biacore International AB, Uppsala, Sweden). Plasmon free resonance occurs when surface plasmon waves are excited at a metal/liquid interface. By reflecting directed light from the surface as a result of contact with a sample, the surface plasmon resonance causes a change in the refractive index at the surface layer. The refractive index change for a given change of mass concentration at the surface layer is similar for many bioactive agents (including proteins, peptides, lipids and polynucleotides), and since the BIACORE® sensor surface can be functionalized to bind a variety of these bioactive agents, detection of a wide selection of test compounds can thus be accomplished.

[0222] Monitoring Efficacy of a Drug During Clinical Trials

[0223] Using the NRHK1 detection methods of this invention, the efficacy of a therapeutic agent for NRHK1 -related diseases can be monitored during clinical trials. The therapeutic agent may be a drug, small molecule, agonist, antagonist, peptidomimetic, protein, peptide, or polynucleotide. The changes in the expression or activity of the NRHK1 gene in response to the treatment of the agent can be used to evaluate the therapeutic effect of the agent on patients with NRHK1-related diseases. In addition, the expression or activity of NRHK1 in response to the agent can be measured at various points during the clinical trial.

[0224] In a preferred embodiment, the method for monitoring the effectiveness of the therapeutic agent includes the steps of (i) obtaining a pre-administration sample from a subject; (ii) detecting the level of expression or activity of NRHK1 in the pre-administration sample; (iii) obtaining one or more post-administration samples from the subject; (iv) detecting the level of expression or activity of NRHK1 in the post-administration samples; (v) comparing the level of expression or activity of NRHK1 in the pre-administration sample to the level of expression or activity of NRHK1 in the post administration samples. The dose or frequency of the administration of the agent may be adjusted based on the effectiveness of the agent in a particular patient. Therefore, NRHK1 expression or activity can be used as an indicator of the effectiveness of a therapeutic agent for NRHK1 -related diseases, even if the agent does not produce an observable phenotypic response.

[0225] Prognostic Assays

[0226] The detection methods described herein can be used to identify subjects having or at risk of developing NRHK1 -related diseases. In addition, the detection methods can be used to determine whether an agent (e.g., an agonist, antagonist, peptidomimetic, protein, peptide, polynucleotide, small molecule, or other drug candidate) can be administered to a subject for effectively treating or preventing NRHK1 -related diseases.

[0227] NRHK1 expression profiles at different progression stages of NRHK1 -related diseases can be established. In addition, NRHK1 expression profiles in different patients who have different responses to a drug treatment are determined. A pattern may emerge such that a particular expression profile may be correlated to an increased likelihood of a poor prognosis. Therefore, the prognostic assay of the present invention may be used to determine whether a subject undergoing a treatment for a NRHK1 -related disease has a poor outlook for long term survival or disease progression. Preferably, prognosis is performed shortly after diagnosis, such as within a few days after diagnosis. The result of prognosis can then be used to devise individualized treatment program, thereby enhancing the effectiveness of the treatment as well as the likelihood of long-term survival and well being.

[0228] The method of the invention can also be used to detect genetic alterations in the NRHK1 gene, thereby determining if a subject with the altered gene is at risk for damages characterized by aberrant regulation in NRHK1 activity or expression. In a preferred embodiment, the method includes detecting the presence or absence of a genetic alteration that affects the integrity of the NRHK1 gene, or detecting the aberrant expression of the NRHK1 gene. The genetic alteration can be detected by ascertaining the existence of at least one of the following: 1) deletion of one or more nucleotides from the NRHK1 gene; 2) addition of one or more nucleotides to the NRHK1 gene; 3) substitution of one or more nucleotides of the NRHK1 gene, 4) a chromosomal rearrangement in the NRHK1 gene; 5) alteration in the level of a messenger RNA transcript of the NRHK1 gene, 6) aberrant modification of the NRHK1 gene, 7) the presence of a non-wild type splicing pattern of a messenger RNA transcript of the NRHK1 gene, 8) non-wild type level NRHK1, 9) allelic loss of an NRHK1 gene, and 10) inappropriate post-translational modification of NRHK1.

[0229] In one embodiment, detection of the alteration involves the use of a probe/primer in a polymerase chain reaction (such as anchor PCR or RACE PCR) or alternatively, in a ligation chain reaction (LCR). LCR can be particularly useful for detecting point mutations in the NRHK1 gene. This method includes the steps of collecting a sample from a subject, isolating polynucleotides (e.g., genomic DNA, mRNA, or both) from the sample, contacting the polynucleotide with one or more primers which specifically hybridize to the NRHK1 gene or gene product, and detecting the presence or absence of an amplification product, or detecting the size of the amplification product and comparing its length to a control. It is understood that PCR and/or LCR can be used as a preliminary amplification step in conjunction with any other techniques described herein.

[0230] Alternative amplification methods include: self sustained sequence replication (Guatelli et al., Proc. Natl. Acad. Sci. USA 87:1874-1878, 1990), transcriptional amplification system (Kwoh et al., Proc. Natl. Acad. Sci. USA 86:1173-1177, 1989), and Q-Beta Replicase (Lizardi et al. Bio-Technology 6:1197, 1988).

[0231] In another embodiment, mutations in the NRHK1 gene can be identified using restriction enzymes. Differences in restriction enzyme digestion patterns indicates mutation(s) in the NRHK1 gene or its transcripts. Moreover, sequence specific ribozymes can be used to detect the presence of specific mutations. See, for example, U.S. Pat. No. 5,498,531.

[0232] In yet another embodiment, genetic mutations in the NRHK1 gene can be identified using high density arrays which contain a large number of oligonucleotides probes. For example, genetic mutations in the NRHK1 gene can be identified in two dimensional arrays. In this example, a first hybridization array of probes is used to scan through long stretches of DNA in a sample and a control in order to identify base changes between the two sequences. This step allows the identification of point mutations. This step is followed by a second hybridization array that allows the characterization of specific mutations by using smaller and specialized probe arrays which are complementary to all variants or mutations detected. Each mutation array is composed of parallel probe sets, one complementary to the wild-type gene and the other complementary to the mutant gene.

[0233] In still another embodiment, any sequencing reactions known in the art can be used to directly sequence the NRHK1 gene in order to detect mutations. It is contemplated that any automated sequencing procedures can be utilized, including sequencing by mass spectrometry.

[0234] In one embodiment, protection from cleavage agents is used to detect mismatched bases in RNA/RNA or RNA/DNA heteroduplexes. In general, the “mismatch cleavage” technique involves forming heteroduplexes by hybridizing an RNA or DNA (labeled) containing the wild-type NRHK1 gene sequence to a potentially mutant RNA or DNA obtained from a tissue sample. The double-stranded duplexes are treated with an agent which cleaves single-stranded regions of the duplex. The agent may be RNase (for RNA/DNA duplexes), or SI nuclease (for DNA/DNA hybrids). In one case, either DNA/DNA or RNA/DNA duplexes are treated with piperidine and hydroxylamine, or piperidine and osmium tetroxide, in order to digest mismatched regions. After the digestion, the resulting material is separated by size on a denaturing polyacrylamide gel from which the site(s) of mutation may be determined.

[0235] In a preferred embodiment, the mismatch cleavage reaction employs one or more proteins that recognize mismatched base pairs in double-stranded DNA. Examples of these proteins include “DNA mismatch repair” enzymes. For instance, the mutY enzyme of E. coli cleaves A at G/A mismatches, and the thymidine DNA glycosylase from HeLa cells cleaves T at G/T mismatches. In one case, cDNAs are prepared from mRNAs isolated from test cells. The cDNAs are then hybridized to a probe derived from the NRHK1 gene. The duplex thus formed is treated with a DNA mismatch repair enzyme, and the cleavage products, if any, can be detected from electrophoresis protocols or the like. See, for example, U.S. Pat. No. 5,459,039.

[0236] In another embodiment, alterations in electrophoretic mobility are used to identify mutations in the NRHK1 gene. Differences in electrophoretic mobility between mutant and wild type polynucleotides can be detected using single strand conformation polymorphism (SSCP). The resulting alteration in electrophoretic mobility enables the detection of a single base change. The DNA fragments can be labeled or detected with probes. In one case, the sensitivity of the assay is enhanced by using RNA, in which the secondary structure is more sensitive to a change in sequence. In a preferred embodiment, the assay utilizes heteroduplex analysis to separate double stranded heteroduplex molecules on the basis of changes in electrophoretic mobility (Keen et al., Trends Genet 7:5, 1991).

[0237] In yet another embodiment, the movement of mutant or wild-type fragments is evaluated using denaturing gradient gel electrophoresis (DGGE). For this purpose, DNA fragments can be modified to insure that they do not completely denature. For instance, a GC clamp of approximately 40 GC-rich base pairs can be added to the DNA fragment using PCR. In a further embodiment, a temperature gradient is used in place of a denaturing gradient (Rosenbaum and Reissner, Biophys Chem 265:12753, 1987).

[0238] Examples of other techniques for detecting point mutations include, but are not limited to, selective oligonucleotide hybridization, selective amplification, or selective primer extension. In one embodiment, oligonucleotide primers for specific amplification carry the mutation of interest in the center of the molecule (so that amplification depends on differential hybridization) or at the extreme 3′ end of one primer where, under appropriate conditions, mismatch can prevent or reduce polymerase extension. See, for example, Saiki et al., Proc. Natl. Acad. Sci USA 86:6230, 1989. In addition, it may be desirable to introduce a novel restriction site in the region of the mutation to create cleavage-based detection.

[0239] The methods described herein can be performed using prepackaged diagnostic kits which comprise at least one polynucleotide probe or one antibody of the present invention. These kits can be used in clinical settings to diagnose subjects exhibiting symptoms or family history of a NRHK1 -related disease. Any cell type or tissue in which NRHK1 is expressed can be used for prognostic or diagnostic purposes.

[0240] Prophylactic Methods

[0241] This invention also provides methods for preventing diseases associated with aberrant NRHK1 expression or activity. The methods comprise administering to a target subject an agent which modulates NRHK1 expression or activity.

[0242] Subjects at risk of diseases which are caused by or attributed to aberrant NRHK1 expression or activity can be identified using the diagnostic or prognostic assays described herein. A prophylactic agent can be administered prior to the manifestation of NRHK1 -related disease symptoms in order to prevent or delay NRHK1 -related diseases. Suitable prophylactic agents include mutant NRHK1 proteins, NRHK1 antagonist agents, or NRHK1 antisense polynucleotides.

[0243] The prophylactic methods of this invention can be specifically tailored or modified based on knowledge obtained from the study of pharmacogenomics. Pharmacogenomics includes the application of genomics technologies, such as gene sequencing, statistical genetics, and gene expression analysis, to drugs which are either in clinical development or on the market. Pharmacogenomics can be used to determine a subject's response to a drug (e.g., a subject's “drug response phenotype” or “drug response genotype”). Thus, another aspect of this invention is to provide methods for tailoring an individual's prophylactic or therapeutic treatment using NRHK1 modulators according to the individual's drug response genotype. Pharmacogenomics allows a clinician or physician to target prophylactic or therapeutic treatments to subjects who will most benefit from the treatment and to avoid treatment of subjects who will experience toxic drug-related side effects.

[0244] One pharmacogenomics approach to identify genes that predict drug response, known as “a genome-wide association,” relies primarily on a high-resolution map of the human genome consisting of already known gene-related sites (e.g., a “bi-allelic” gene marker map which consists of 60,000-100,000 polymorphic or variable sites on the human genome, each of which has two variants). Such a high-resolution genetic map can be compared to a map of the genome of each of a statistically substantial number of subjects taking part in a Phase II/III drug trial in order to identify genes associated with a particular observed drug response or side effect. Alternatively, such a high resolution map can be generated from a combination of some ten-million known single nucleotide polymorphisms (SNPs) in the human genome. A “SNP” is a common alteration that occurs in a single nucleotide base in a stretch of DNA. For example, a SNP may occur once per every 1000 bases of DNA. A SNP may be involved in a disease process. However, the vast majority of SNPs may be not related to diseases. Given a genetic map based on the occurrence of SNPs, individuals can be grouped into genetic categories depending on a particular pattern of SNPs in their individual genome. In such a manner, treatment regimens can be tailored to groups of genetically similar individuals, taking into account traits that may be common among such genetically similar individuals. Thus, mapping of the NRHK1 gene to SNP maps of patients with NRHK1 -related diseases may facilitate the identification of drug-response-prediction genes.

[0245] Alternatively, the “candidate gene approach” can be utilized to identify genes that predict drug response. According to this method, if a gene that encodes a drug target is known, all common variants of that gene can be easily identified in the population. It then can be determined if a particular drug response is associated with one version of the gene versus another.

[0246] The activity of drug metabolizing enzymes is a major determinant of both the intensity and duration of drug action. The discovery of genetic polymorphisms of drug metabolizing enzymes (e.g., N-acetyltransferase 2 (NAT 2) and cytochrome P450 enzymes CYP2D6 and CYPZCl 9) has provided an explanation as to why some subjects do not obtain the expected drug effects or show exaggerated drug response and serious toxicity after taking the standard and safe dose of a drug. These polymorphisms are expressed in two phenotypes in the population, extensive metabolizer and poor metabolizer. The prevalence of poor metabolizer phenotypes is different among different populations. For example, the gene coding for CYP2D6 is highly polymorphic and several mutations have been identified in poor metabolizers, which all lead to the absence of functional CYP2D6. Poor metabolizers of CYP2D6 and CYP2C 19 quite frequently experience exaggerated drug response and side effects when they receive standard doses. If a metabolite is the active therapeutic moiety, poor metabolizers show no therapeutic response. The other extreme are the so called ultra-rapid metabolizers who do not respond to standard doses. Recently, the molecular basis of ultra-rapid metabolism has been identified to be due to CYP2D6 gene amplification.

[0247] In one embodiment, the “gene expression profiling” method can be utilized to identify genes that predict drug response. In this regard, the gene expression profile of an animal dosed with a drug can give an indication of whether the gene pathways related to toxicity have been turned on.

[0248] Information generated from the above pharmacogenomics approaches can be used to determine the appropriate dosage or treatment regimen suitable for a particular individual. This knowledge can avoid adverse reactions or therapeutic failure, and therefore enhance therapeutic or prophylactic efficiency when treating a subject with an NRHK1 modulator.

[0249] Therapeutic Methods

[0250] As described above, the present invention includes therapeutic methods for treating a subject at risk for, susceptible to, or diagnosed with NRHK -related diseases. The therapeutic methods can be individually tailored based on the subject's drug response genotype. Typically, the therapeutic methods comprise modulating the expression or activity of NRHK1 in the subject. In one embodiment, the method comprises contacting a plurality of cells in the subject with an agent that inhibits the expression or activity of NRHK1. Suitable agents include polynucleotides (e.g., an antisense oligonucleotides of NRHK1), polypeptides (e.g., a dominant negative mutant of NRHK1), or polysaccharides, naturally-occurring target molecules of NRHK1 protein (e.g., an NRHK1 protein substrate or receptor), anti-NRHK1 antibodies, NRHK1 antagonists, or other small organic and inorganic molecule. They may also include vectors comprising polynucleotides encoding NRHK1 inhibitors or antisense sequences. Moreover, the agents can be anti-NRHK1 antibodies conjugated with therapeutic moieties. Suitable agents can be identified using the screening assays of the present invention.

[0251] Pharmaceutical Compositions

[0252] The present invention is further directed to pharmaceutical compositions comprising an NRHK1 modulator and a pharmaceutically acceptable carrier. As used herein, a “pharmaceutically acceptable carrier” is intended to include any and all solvents, solubilizers, fillers, stabilizers, binders, absorbents, bases, buffering agents, lubricants, controlled release vehicles, diluents, emulsifying agents, humectants, lubricants, dispersion media, coatings, antibacterial or antifungal agents, isotonic and absorption delaying agents, and the like, compatible with pharmaceutical administration. The use of such media and agents for pharmaceutically active substances is well-known in the art. Except insofar as any conventional media or agent is incompatible with the active compound, use thereof in the compositions is contemplated. Supplementary agents can also be incorporated into the compositions.

[0253] A pharmaceutical composition of the invention is formulated to be compatible with its intended route of administration. Examples of routes of administration include parenteral, e.g., intravenous, intradermal, subcutaneous, oral (e.g., inhalation), transdermal (topical), transmucosal, and rectal administration. Solutions or suspensions used for parenteral, intradermal, or subcutaneous application can include the following components: a sterile diluent such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerine; propylene glycol or other synthetic solvents; antibacterial agents such as benzyl alcohol or methyl parabens; antioxidants such as ascorbic acid or sodium bisulfate; chelating agents such as ethylenediaminetetraacetic acid; buffers such as acetates, citrates or phosphates and agents for the adjustment of tonicity such as sodium chloride or dextrose. pH can be adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. The parenteral preparation can be enclosed in ampoules, disposable syringes or multiple dose vials made of glass or plastic.

[0254] Pharmaceutical compositions suitable for injectable use include sterile aqueous solutions or dispersions and sterile powders for the extemporaneous preparation of sterile injectable solutions or dispersion. For intravenous administration, suitable carriers include physiological saline, bacteriostatic water, Cremophor EL™ (BASF, Parsippany, N.J.) or phosphate buffered saline (PBS). In all cases, the injectable composition should be sterile and should be fluid to the extent that easy syringability exists. It must be stable under the conditions of manufacture and storage and must be preserved against the contaminating action of microorganisms such as bacteria and fungi. The carrier can be a solvent or dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyethylene glycol, and the like), and suitable mixtures thereof. The proper fluidity can be maintained, for example, by the use of a coating such as lecithin, by the maintenance of the required particle size in the case of dispersion and by the use of surfactants. Prevention of the action of microorganisms can be achieved by various antibacterial and antifungal agents, for example, parabens, chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. In many cases, it will be preferable to include isotonic agents, for example, sugars, polyalcohols such as manitol, sorbitol, sodium chloride in the composition. Prolonged absorption of the injectable compositions can be brought about by including in the composition an agent which delays absorption, for example, aluminum monostearate and gelatin.

[0255] Sterile injectable solutions can be prepared by incorporating the active modulator (e.g., an anti-NRHK1 antibody, an NRHK1 activity inhibitor, or a gene therapy vector expressing antisense nucleotide to NRHK1) in the required amount in an appropriate solvent, followed by filtered sterilization. Generally, dispersions are prepared by incorporating the active compound into a sterile vehicle which contains a basic dispersion medium and the required other ingredients from those enumerated above. In the case of sterile powders for the preparation of sterile injectable solutions, the preferred methods of preparation are vacuum drying and freeze-drying which yields a powder of the active, ingredient plus any additional desired ingredient from a previously sterile-filtered solution thereof.

[0256] Oral compositions generally include an inert diluent or an edible carrier. They can be enclosed in gelatin capsules or compressed into tablets. For the purpose of oral therapeutic administration, the active compound can be incorporated with excipients and used in the form of tablets, troches, or capsules. Oral compositions can also be prepared using a fluid carrier for use as a mouthwash, wherein the compound in the fluid carrier is applied orally and swished and expectorated or swallowed. Pharmaceutically compatible binding agents, and/or adjuvant materials can be included as part of the composition. The tablets, pills, capsules, troches and the like can contain any of the following ingredients, or compounds of a similar nature: a binder such as microcrystalline cellulose, gum tragacanth or gelatin; an excipient such as starch or lactose, a disintegrating agent such as alginic acid, Primogel, or corn starch; a lubricant such as magnesium stearate or Stertes; a glidant such as colloidal silicon dioxide; a sweetening agent such as sucrose or saccharin; or a flavoring agent such as peppermint, methyl salicylate, or orange flavoring.

[0257] For administration by inhalation, the compounds are delivered in the form of an aerosol spray from pressured container or dispenser which contains a suitable propellant, e.g., a gas such as carbon dioxide, or a nebulizer.

[0258] Systemic administration can also be by transmucosal or transdermal means. For transmucosal or transdermal administration, penetrants appropriate to the barrier to be permeated are used in the formulation. Such penetrants are generally known in the art, and include, for example, for transmucosal administration, detergents, bile salts, and fusidic acid derivatives. Transmucosal administration can be accomplished through the use of nasal sprays or suppositories. For transdermal administration, the bioactive compounds are formulated into ointments, salves, gels, or creams as generally known in the art.

[0259] The compounds can also be prepared in the form of suppositories (e.g., with conventional suppository bases such as cocoa butter and other glycerides) or retention enemas for rectal delivery.

[0260] In one embodiment, the therapeutic moieties, which may contain a bioactive compound, are prepared with carriers that will protect the compound against rapid elimination from the body, such as a controlled release formulation, including implants and microencapsulated delivery systems. Biodegradable, biocompatible polymers can be used, such as ethylene vinyl acetate, polyanhydrides, polyglycolic acid, collagen, polyorthoesters, and polylactic acid. Methods for preparation of such formulations will be apparent to those skilled in the art. The materials can also be obtained commercially from e.g. Alza Corporation and Nova Pharmaceuticals, Inc. Liposomal suspensions (including liposomes targeted to infected cells with monoclonal antibodies to viral antigens) can also be used as pharmaceutically acceptable carriers. These can be prepared according to methods known to those skilled in the art, for example, as described in U.S. Pat. No. 4,522,811.

[0261] It is especially advantageous to formulate oral or parenteral compositions in dosage unit form for ease of administration and uniformity of dosage. Dosage unit form as used herein includes physically discrete units suited as unitary dosages for the subject to be treated; each unit containing a predetermined quantity of active compound calculated to produce the desired therapeutic effect in association with the required pharmaceutical carrier. The specification for the dosage unit forms of the invention are dictated by and directly dependent on the unique characteristics of the active compound and the particular therapeutic effect to be achieved, and the limitations inherent in the art of compounding such an active compound for the treatment of individuals.

[0262] Toxicity and therapeutic efficacy of such compounds can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, e.g., for determining the LD50 (the dose lethal to 50% of the population) and the ED50 (the dose therapeutically effective in 50% of the population). The dose ratio between toxic and therapeutic effects is the therapeutic index and it can be expressed as the ratio LD50/ED50. Compounds which exhibit large therapeutic indices are preferred. While compounds that exhibit toxic side effects may be used, care should be taken to design a delivery system that targets such compounds to the site of affected tissue in order to minimize potential damage to uninfected cells and, thereby, reduce side effects.

[0263] The data obtained from the cell culture assays and animal studies can be used in formulating a range of dosage for use in humans. The dosage of such compounds lies preferably within a range of circulating concentrations that include the ED50 with little or no toxicity. The dosage may vary within this range depending upon the dosage form employed and the route of administration utilized. For any compound used in the method of the invention, the therapeutically effective dose can be estimated initially from cell culture assays. A dose may be formulated in animal models to achieve a circulating plasma concentration range that includes the IC50 (i.e., the concentration of the test compound which achieves a half-maximal inhibition of symptoms) as determined in cell culture. Such information can be used to more accurately determine useful doses in humans. Levels in plasma may be measured, for example, by high performance liquid chromatography.

[0264] The pharmaceutical compositions can be included in a container, pack, or dispenser together with instructions for administration.

[0265] Kits

[0266] The invention also encompasses kits for detecting the presence of an NRHK1 gene product in a biological sample. An example kit comprises reagents for assessing expression of NRHK1 at mRNA or protein level. Preferably, the reagents include an antibody or fragment thereof, wherein the antibody or fragment specifically binds to NRHK1. Optionally, the kits may comprise a polynucleotide probe capable of specifically binding to a transcript of the NRHK1 gene. The kit may also contain means for determining the amount of NRHK1 protein or mRNA in the test sample, and/or means for comparing the amount of NRHK1 protein or mRNA in the test sample to a control or standard. The compound or agent can be packaged in a suitable container.

[0267] The invention further provides kits for assessing the suitability of each of a plurality of compounds for inhibiting NRHK1 -related diseases in cells or human subjects.

[0268] Such kits include a plurality of compounds to be tested, and a reagent (such as an antibody specific to NRHK1 proteins, or a polynucleotide probe or primer capable of hybridizing to the NRHK1 gene) for assessing expression of NRHK1.

[0269] It should be understood that the above-described embodiments are given by way of illustration, not limitation. Various changes and modifications within the scope of the present invention will become apparent to those skilled in the art from the present description.

[0270] Host Cells

[0271] Another aspect of the invention pertains to host cells into which a polynucleotide molecule of the invention is introduced, e.g., an NRHK1 gene or homolog thereof, within an expression vector, a gene delivery vector, or a polynucleotide molecule of the invention containing sequences which allow it to homologously recombine into a specific site of the host cell's genome. The terms “host cell” and “recombinant host cell” are used interchangeably herein. It is understood that such terms refer not only to the particular subject cell but to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein.

[0272] A host cell can be any prokaryotic or eukaryotic cell. For example, an NRHK1 gene can be expressed in bacterial cells such as E. coli, insect cells, yeast or mammalian cells (such as Chinese hamster ovary cells (CHO), COS cells, Fischer 344 rat cells, HLA-B27 rat cells, HeLa cells, A549 cells, or 293 cells). Other suitable host cells are known to those skilled in the art.

[0273] Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional transformation or transfection techniques. As used herein, the terms “transformation” and “transfection” are intended to refer to a variety of art-recognized techniques for introducing foreign polynucleotide (e.g., DNA) into a host cell, including calcium phosphate or calcium chloride co-precipitation, DAKD-dextran-mediated transfection, lipofection, or electoporation.

[0274] For stable transfection of mammalian cells, it is known that, depending upon the expression vector and transfection technique used, only a small fraction of cells may integrate the foreign DNA into their genome. In order to identify and select these integrants, a gene that encodes a selectable flag (e.g., resistance to antibiotics) is generally introduced into the host cells along with the gene of interest. Preferred selectable flags include those which confer resistance to drugs, such as G418, hygromycin and methotrexate. A Polynucleotide encoding a selectable flag can be introduced into a host cell by the same vector as that encoding NRHK1 or can be introduced by a separate vector. Cells stably transfected with the introduced polynucleotide can be identified by drug selection (e.g., cells that have incorporated the selectable flag gene will survive, while the other cells die).

[0275] A host cell of the invention, such as a prokaryotic or eukaryotic host cell in culture, can be used to produce (i.e., express) NRHK1. Accordingly, the invention further provides methods for producing NRHK1 using the host cells of the invention. In one embodiment, the method comprises culturing the host cell of invention (into which a recombinant expression vector containing an NRHK1 gene has been introduced) in a suitable medium such that NRHK1 is produced. In another embodiment, the method further comprises isolating NRHK1 from the medium or the host cell.

[0276] Transgenic and Knockout Animals

[0277] The host cells of the invention can also be used to produce non-human transgenic animals. For example, in one embodiment, a host cell of the invention is a fertilized oocyte or an embryonic stem cell into which NRHK1 -coding sequences have been introduced. Such host cells can then be used to create non-human transgenic animals in which exogenous sequences encoding NRHK1 have been introduced into their genome or homologous recombinant animals in which endogenous sequences encoding NRHK1 have been altered. Such animals are useful for studying the function and/or activity of NRHK1 and for identifying and/or evaluating modulators of NRHK1 activity. As used herein, a “transgenic animal” is a non-human animal, preferably a mammal, more preferably a rodent such as a rat or mouse, in which one or more of the cells of the animal includes a transgene. Other examples of transgenic animals include non-human primates, sheep, dogs, cows, goats, chickens, amphibians, and the like. A transgene is exogenous DNA which is integrated into the genome of a cell from which a transgenic animal develops and which remains in the genome of the mature animal, thereby directing the expression of an encoded gene product in one or more cell types or tissues of the transgenic animal. As used herein, a “homologous recombinant animal” or “knockout animal” is a non-human animal, preferably a mammal, more preferably a mouse, in which an endogenous NRHK1 gene has been altered by homologous recombination between the endogenous gene and an exogenous DNA molecule introduced into a cell of the animal, e.g., an embryonic cell of the animal, prior to development of the animal.

[0278] A transgenic animal of the invention can be created by introducing an NRHK1-encoding polynucleotide into the mate pronuclei of a fertilized oocyte, e.g., by microinjection or retroviral infection, and allowing the oocyte to develop in a pseudopregnant female foster animal. Intronic sequences and polyadenylation signals can also be included in the transgene to increase the efficiency of expression of the transgene. A tissue-specific regulatory sequence(s) can be operably linked to a transgene to direct expression of NRHK1 to particular cells. Methods for generating transgenic animals via embryo manipulation and microinjection, particularly animals such as mice, have become conventional in the art. Similar methods are used for production of other transgenic animals. A transgenic founder animal can be identified based upon the presence of a transgene of the invention in its genome and/or expression of mRNA corresponding to a gene of the invention in tissues or cells of the animals. A transgenic founder animal can then be used to breed additional animals carrying the transgene. Moreover, transgenic animals carrying a transgene encoding NRHK1 can further be bred to other transgenic animals carrying other transgenes.

[0279] To create a homologous recombinant animal (knockout animal), a vector is prepared which contains at least a portion of a gene of the invention into which a deletion, addition or substitution has been introduced to thereby alter, e.g., functionally disrupt, the gene. The gene can be a human gene, but more preferably, is a non-human homolog of a human gene of the invention (e.g., a homolog of the NRHK1 gene). For example, a mouse gene can be used to construct a homologous recombination polynucleotide molecule, e.g., a vector, suitable for altering an endogenous gene of the invention in the mouse genome. In a preferred embodiment, the homologous recombination polynucleotide molecule is designed such that, upon homologous recombination, the endogenous gene of the invention is functionally disrupted (i.e., no longer encodes a functional protein; also referred to as a “knockout” vector). Alternatively, the homologous recombination polynucleotide molecule can be designed such that, upon homologous recombination, the endogenous gene is mutated or otherwise altered but still encodes functional protein (e.g., the upstream regulatory region can be altered to thereby alter the expression of the endogenous NRHK1 gene). In the homologous recombination polynucleotide molecule, the altered portion of the gene of the invention is flanked at its 5′ and 3′ ends by additional polynucleotide sequence of the gene of the invention to allow for homologous recombination to occur between the exogenous gene carried by the homologous recombination polynucleotide molecule and an endogenous gene in a cell, e.g., an embryonic stem cell. The additional flanking polynucleotide sequence is of sufficient length for successful homologous recombination with the endogenous gene.

[0280] Typically, several kilobases of flanking DNA (both at the 5′ and 3′ ends) are included in the homologous recombination polynucleotide molecule. The homologous recombination polynucleotide molecule is introduced into embryonic stem cells by electroporation. The cells in which the introduced gene has homologously recombined with the endogenous gene are selected. The selected cells can then be injected into a blastocyst of an animal (e.g., a mouse) to form aggregation chimeras. A chimeric embryo can then be implanted into a suitable pseudopregnant female foster animal and the embryo brought to term. Progeny harboring the homologously recombined DNA in their germ cells can be used to breed animals in which all cells of the animal contain the homologously recombined DNA by germline transmission of the homologously recombined DNA. Methods for constructing homologous recombination polynucleotide molecules, e.g., vectors, or homologous recombinant animals are well known in the art.

[0281] In another embodiment, transgenic non-human animals can be produced which contain selected systems which allow for regulated expression of the transgene. One example of such a system is the cre/loxP recombinase system of bacteriophage P1. Another example of a recombinase system is the FLP recombinase system of Saccharomyces cerevisiae (see e.g., O'Gorman et al., Science 251:1351-1355, 1991). If a cre/loxP recombinase system is used to regulate expression of the transgene, animals containing transgenes encoding both the Cre recombinase and a selected protein are required. Such animals can be provided through the construction of “double” transgenic animals, e.g., by mating two transgenic animals, one containing a transgene encoding a selected protein and the other containing a transgene encoding a recombinase.

[0282] Clones of the non-human transgenic animals described herein can also be produced according to the methods described in Wilmut, I. et al., Nature 385:810-813, 1997, and PCT International Publication Nos. WO97/07668 and WO97/07669. In brief, a cell, e.g., a somatic cell, from the transgenic animal can be isolated and induced to exit the growth cycle and enter G0 phase. The quiescent cell can then be fused, e.g., through the use of electrical pulses, to an enucleated oocyte from an animal of the same species from which the quiescent cell is isolated. The reconstructed oocyte is then cultured such that it develops to morula or blastocyte and then transferred to pseudopregnant female foster animal. The offspring borne of this female foster animal will be a clone of the animal from which the cell, e.g., the somatic cell, is isolated.

EXAMPLES Example 1

[0283] Identification of NRHK1 Sequence in Human Genome Database

[0284] The nucleic acid sequence of NRHK1 is obtained from a newly developed genomic prediction pipeline. Briefly, the X-ray crystal structures of the catalytic domains of protein kinases were collected and aligned together according to their structural identity/similarities. The alignment was converted into a “scoring matrix” which carried the structural profile of the kinase catalytic domains. This scoring matrix was then used to search the Celera Human Genome database for sequences that have kinase catalytic domains.

Example 2

[0285] BLAST Analysis

[0286] Sequence alignments between NRHK1 and other sequences in GenBank database were performed using the standard protein-protein BLAST(blastp), standard nucleotide-nucleotide BLAST(blastn), BLAST2 Sequences, and human genome BLAST programs that are available at NCBI's BLAST website.

[0287] A standard protein-protein BLAST search in the “nr” database (available at NCBI's BLAST website) with “Filter” setting unchecked, “Expect” setting at 10.0, “Word Size” setting at 3, “Matrix” setting at BLOSUM62, “Gap costs” setting at Existence:11 and Extension:1, identified partial amino acid sequence similarities between NRHK1 and a number of proteins. These proteins include, but are not limited to, Populus x canescen NIMA-related protein kinase (Entrez accession number: AF469649, 28% alignment to amino acid residues 25-285 of NRHK1), L. esculentum LSTK-1-like kinase (Entrez accession number: AF079103, 27% alignment to amino acid residues 25-321 of NRHK1), C. elegans hypothetical protein T07A9.3 (Entrez accession number: AF036706, 25% alignment to amino acid residues 27-307 of NRHK1), and human NIMA-related kinase 1 (Entrez accession number: XM_(—)048605, 24% alignment to amino acid residues 25-297 of NRHK1).

[0288] A conserved domain search was performed within the standard protein-protein BLAST search with the RPS-BLAST 2.2.3 [Apr. 24, 2002] program. The amino acid residues 28-297 of NRHK1 share high homologies to the consensus sequences of the catalytic domain of ser/thr protein kinase (Entrez accession number: smart00220, 100.0%alignment), the kinase domain of pkinase (Entrez accession number: pfam00069, 100.0% alignment), and the catalytic domain of tyrosine kinase (Entrez accession number: smart00219, 87.5% alignment).

[0289] A standard nucleotide-nucleotide BLAST search in database nr (available at NCBI's BLAST website) with “Filter” setting unchecked, “Expect” setting at 10.0, “Word Size” setting at 3, identified only one nucleotide sequence: a putative human cDNA LOC169436 (XM_(—)095696, SEQ ID NO:4) that showed significant homology to nucleotides 88-174 (100% identities), 174-583 (100% identities), and 575-2403 of NRHK1 (99% identities).

[0290] A standard nucleotide-nucleotide BLAST search in the “pat” database (available at NCBI's BLAST website) with “Filter” setting unchecked, “Expect” setting at 10.0, “Word Size” setting at 3, identified significant nucleotide sequence similarities between NRHK1 with a human protein kinase-like protein SGK071 (Entrez accession number: AX056458, SEQ ID NOS:5 and 6), which was disclosed in PCT patent application WO00/73469. Further analysis using pairwise BLAST algorithm revealed that NRHK1 and SGK071 share 84% sequence identities at the amino acid level (blastp, matrix: BLOSUM62, gap open: 11, Gap extension: 1, x_dropoff: 50, expect: 10.0, wordsize: 3, filter: unchecked), and 90% sequence identities at nucleotide level (blastn, match: 1, mismatch: −2, gap open: 5, gap extension: 0, x_dropoff: 50, expect: 10.0, wordsize: 11, filter: unchecked).

[0291] A human genome search was carried out using blastn program with Expect setting at 0.01, Filter setting at default, Descriptions setting at 100, and Alignment settings at 100. The NRHK1 gene was mapped to locus 9q34 of human chromosome 9.

[0292] Specifically, NRHK1 gene is located between genes LOC157890 and LOC57109, and overlaps with gene LOC169436 on chromosome 9. All the twenty-one exons of the NRHK1 gene were mapped to nucleotides 110001 to 137201 in human chromosome 9 of the Entrez Human Genome Sequence Database maintained by NCBI. The exons were also mapped to Celera genomic database (SEQ ID NO:3). The exons/introns in the NRHK1 gene were determined using the program “sim4” described by Florea et al. in “A computer program for aligning a cDNA sequence with a genomic DNA sequence.” Genome Res. 8:967-974, 1998.

Example 3

[0293] Hydrophobicity Analysis

[0294] The hydrophobicity profile of NRHK1 sequence (FIG. 5) was generated using the GES (Goldman, Engelman and Steitz) hydrophobicity scale (Engelman, D. M., Steitz, T. A. and Goldman, A. 1986. Identifying nonpolar transmembrane helices in amino acid sequences of membrane proteins. Ann. Rev. Biophys. Biophys.Chem. 15:321-353, 1986). Briefly, the GES scale is used to identify nonpolar transbilayer helices. The curve is the average of a residue-specific hydrophobicity scale over a window of 20 residues. When the line is in the upper half of the frame (positive), it indicates a hydrophobic region and when it is in the lower half (negative), a hydrophilic region.

[0295] In FIG. 5, the X-axis represents the length of the protein in amino acids (aa), while the Y-axis represents the GES score. The curve line shows the GES pattern of the entire protein, while the strait line represents certain cutoff for potential membrane spanning domains. The hydrophobicity profile indicates that NRHK1 is probably not a membrane protein.

[0296] Having described the preferred embodiments of compositions, organisms and methodologies employing a novel human gene NRHK1 (which are intended to be illustrative and not limiting), it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. Therefore, it is understood that changes may be made in the particular embodiments disclosed which are within the scope of what is described as defined by the appended claims.

1 30 1 2493 DNA Homo sapiens 1 atgcttgggc cagggtccaa tcgcaggcgc cccacgcagg gggagcgagg cccagggtcc 60 cccggagagc ccatggagaa gtaccaggtt ttgtaccagc tgaatcctgg ggccttgggg 120 gtgaacctgg tggtggagga aatggaaacc aaagtcaagc atgtgataaa gcaggtggaa 180 tgcatggatg accattacgc cagtcaggcc ctggaggagc tgatgccact gctgaagctg 240 cggcacgccc acatctctgt gtaccaggag ctgttcatca cgtggaatgg ggagatctct 300 tctctgtacc tctgcctggt gatggagttc aatgagctca gcttccagga ggtcattgag 360 gataagagga aggcaaagaa aatcattgac tctgagtgga tgcagaatgt gctgggccag 420 gtgctggacg cgctggaata cctgcaccat ttggacatca tccacaggaa tctcaaaccc 480 tccaacatca tcctcatcag cagtgaccac tgcaaactgc aggacctgag ttccaatgtg 540 ctaatgacag acaaagccaa atggaatatt cgtgcggagg aagacccctt tcgtaagtcc 600 tggatggccc ctgaagccct caacttctcc ttcagccaga aatcagacat ctggtccctg 660 ggctgcatca ttctggacat gaccagctgc tccttcatgg atggcacaga agccatgcat 720 ctgcggaagt ccctccgcca gagcccaggc agcctgaagg ccgtcctgaa gacaatggag 780 gagaagcaga tcccggatgt ggaaaccttc aggaatcttc tgcccttgat gctccagatc 840 gacccctcgg atcgaataac gataaaggac gtggtgcaca tcaccttctt gagaggctcc 900 ttcaagtcct cgtgcgtctc tctgaccctg caccggcaga tggtgcctgc gtccatcacc 960 gacatgctgt tagaaggcaa cgtggccagc attttaggtg atgctgggga cacaaagggg 1020 gagcgtgccc tgaagctcct gtccatggcc ttggcatcct attgtttagt tccagagggt 1080 tcattattta tgcccctggc cttgctccac atgcacgacc agtggctcag ctgtgaccag 1140 gacagagtcc ctgggaagag agactttgcc tccctgggga aactagggaa gctgttgggc 1200 cccatcccaa agggtctgcc gtggcccccg gagctggtgg aggtggtggt cacgaccatg 1260 gagctacatg acagggtcct cgatgtccag ctgtgtgcct gctccctgct gctgcacctc 1320 ctgggccaag gcctgccttt tgcctgctcc gtggccctgg acaagttcct gatgatcctg 1380 ccagttttcc cagctatgaa gcgaggagct ggacacgagg tcctctggag tcaccctcag 1440 ggaggatggg ttgtgtcctc tgaagagggc tgcgctggtg caccacccgg aagccaaggc 1500 tccctgcaac caagccatca cctccaccct gctgagtgct cttcagagcc accccgagga 1560 ggagccactt cttgtcatgg tctacagcct gctagccatc accacaaccc aggggcccag 1620 tgggcttccg aagccgccag ccaggactgt gggaaggaga gggccataca gagcgctcac 1680 accttcaccc acaaatcgga gtcagagtca ctgtcagagg agctgcagaa cgctgggctg 1740 ctggagcaca tcctggagca cctcaacagc tccctcaaaa gcagggacgt ctgcgccagc 1800 ggcctgggcc tgctctgggc cctcctgctg gacgacccca tcttggcact ccagcgcccc 1860 aggaaaaaga gagctccaaa ccacggaaag cccgggaaac ccaagaaccc tgccagcacc 1920 caaagtatca ttgtgaacaa ggcccccttg gagaaggtcc cggacctcat cagccaggtg 1980 ttggccacct accctgcgga tggggaaatg gcagaagcca gctgcggagt cttctggctg 2040 ctgtccctgc tgggctgcat caaggagcag cagtttgaac aagtggtggc gctgctcctg 2100 caaagcatcc ggctgtgcca ggacagagcc ctgctggtga acaatgccta ccggggactg 2160 gccagcctgg tgaaggtgtc agagctggcg gccttcaagg tggtggtgca ggaggagggc 2220 ggcagtggcc tcagcctcat caaggagacc taccagctcc acagggacga cccggaggtg 2280 gtggagaacg tgggcatgct gctggtccac ctggcttcct atgaggagat cctgccggag 2340 ctggtgtcca gtagtatgaa ggccctgctc caggagatca aggagcgctt cacctccagc 2400 ctggtgagtg acagcagcgc cttcagcaaa ccaggcctcc ctccaggtgg aagcccccag 2460 ctggggtgca ccacgtctgg gggactggaa tag 2493 2 830 PRT Homo sapiens 2 Met Leu Gly Pro Gly Ser Asn Arg Arg Arg Pro Thr Gln Gly Glu Arg 1 5 10 15 Gly Pro Gly Ser Pro Gly Glu Pro Met Glu Lys Tyr Gln Val Leu Tyr 20 25 30 Gln Leu Asn Pro Gly Ala Leu Gly Val Asn Leu Val Val Glu Glu Met 35 40 45 Glu Thr Lys Val Lys His Val Ile Lys Gln Val Glu Cys Met Asp Asp 50 55 60 His Tyr Ala Ser Gln Ala Leu Glu Glu Leu Met Pro Leu Leu Lys Leu 65 70 75 80 Arg His Ala His Ile Ser Val Tyr Gln Glu Leu Phe Ile Thr Trp Asn 85 90 95 Gly Glu Ile Ser Ser Leu Tyr Leu Cys Leu Val Met Glu Phe Asn Glu 100 105 110 Leu Ser Phe Gln Glu Val Ile Glu Asp Lys Arg Lys Ala Lys Lys Ile 115 120 125 Ile Asp Ser Glu Trp Met Gln Asn Val Leu Gly Gln Val Leu Asp Ala 130 135 140 Leu Glu Tyr Leu His His Leu Asp Ile Ile His Arg Asn Leu Lys Pro 145 150 155 160 Ser Asn Ile Ile Leu Ile Ser Ser Asp His Cys Lys Leu Gln Asp Leu 165 170 175 Ser Ser Asn Val Leu Met Thr Asp Lys Ala Lys Trp Asn Ile Arg Ala 180 185 190 Glu Glu Asp Pro Phe Arg Lys Ser Trp Met Ala Pro Glu Ala Leu Asn 195 200 205 Phe Ser Phe Ser Gln Lys Ser Asp Ile Trp Ser Leu Gly Cys Ile Ile 210 215 220 Leu Asp Met Thr Ser Cys Ser Phe Met Asp Gly Thr Glu Ala Met His 225 230 235 240 Leu Arg Lys Ser Leu Arg Gln Ser Pro Gly Ser Leu Lys Ala Val Leu 245 250 255 Lys Thr Met Glu Glu Lys Gln Ile Pro Asp Val Glu Thr Phe Arg Asn 260 265 270 Leu Leu Pro Leu Met Leu Gln Ile Asp Pro Ser Asp Arg Ile Thr Ile 275 280 285 Lys Asp Val Val His Ile Thr Phe Leu Arg Gly Ser Phe Lys Ser Ser 290 295 300 Cys Val Ser Leu Thr Leu His Arg Gln Met Val Pro Ala Ser Ile Thr 305 310 315 320 Asp Met Leu Leu Glu Gly Asn Val Ala Ser Ile Leu Gly Asp Ala Gly 325 330 335 Asp Thr Lys Gly Glu Arg Ala Leu Lys Leu Leu Ser Met Ala Leu Ala 340 345 350 Ser Tyr Cys Leu Val Pro Glu Gly Ser Leu Phe Met Pro Leu Ala Leu 355 360 365 Leu His Met His Asp Gln Trp Leu Ser Cys Asp Gln Asp Arg Val Pro 370 375 380 Gly Lys Arg Asp Phe Ala Ser Leu Gly Lys Leu Gly Lys Leu Leu Gly 385 390 395 400 Pro Ile Pro Lys Gly Leu Pro Trp Pro Pro Glu Leu Val Glu Val Val 405 410 415 Val Thr Thr Met Glu Leu His Asp Arg Val Leu Asp Val Gln Leu Cys 420 425 430 Ala Cys Ser Leu Leu Leu His Leu Leu Gly Gln Gly Leu Pro Phe Ala 435 440 445 Cys Ser Val Ala Leu Asp Lys Phe Leu Met Ile Leu Pro Val Phe Pro 450 455 460 Ala Met Lys Arg Gly Ala Gly His Glu Val Leu Trp Ser His Pro Gln 465 470 475 480 Gly Gly Trp Val Val Ser Ser Glu Glu Gly Cys Ala Gly Ala Pro Pro 485 490 495 Gly Ser Gln Gly Ser Leu Gln Pro Ser His His Leu His Pro Ala Glu 500 505 510 Cys Ser Ser Glu Pro Pro Arg Gly Gly Ala Thr Ser Cys His Gly Leu 515 520 525 Gln Pro Ala Ser His His His Asn Pro Gly Ala Gln Trp Ala Ser Glu 530 535 540 Ala Ala Ser Gln Asp Cys Gly Lys Glu Arg Ala Ile Gln Ser Ala His 545 550 555 560 Thr Phe Thr His Lys Ser Glu Ser Glu Ser Leu Ser Glu Glu Leu Gln 565 570 575 Asn Ala Gly Leu Leu Glu His Ile Leu Glu His Leu Asn Ser Ser Leu 580 585 590 Lys Ser Arg Asp Val Cys Ala Ser Gly Leu Gly Leu Leu Trp Ala Leu 595 600 605 Leu Leu Asp Asp Pro Ile Leu Ala Leu Gln Arg Pro Arg Lys Lys Arg 610 615 620 Ala Pro Asn His Gly Lys Pro Gly Lys Pro Lys Asn Pro Ala Ser Thr 625 630 635 640 Gln Ser Ile Ile Val Asn Lys Ala Pro Leu Glu Lys Val Pro Asp Leu 645 650 655 Ile Ser Gln Val Leu Ala Thr Tyr Pro Ala Asp Gly Glu Met Ala Glu 660 665 670 Ala Ser Cys Gly Val Phe Trp Leu Leu Ser Leu Leu Gly Cys Ile Lys 675 680 685 Glu Gln Gln Phe Glu Gln Val Val Ala Leu Leu Leu Gln Ser Ile Arg 690 695 700 Leu Cys Gln Asp Arg Ala Leu Leu Val Asn Asn Ala Tyr Arg Gly Leu 705 710 715 720 Ala Ser Leu Val Lys Val Ser Glu Leu Ala Ala Phe Lys Val Val Val 725 730 735 Gln Glu Glu Gly Gly Ser Gly Leu Ser Leu Ile Lys Glu Thr Tyr Gln 740 745 750 Leu His Arg Asp Asp Pro Glu Val Val Glu Asn Val Gly Met Leu Leu 755 760 765 Val His Leu Ala Ser Tyr Glu Glu Ile Leu Pro Glu Leu Val Ser Ser 770 775 780 Ser Met Lys Ala Leu Leu Gln Glu Ile Lys Glu Arg Phe Thr Ser Ser 785 790 795 800 Leu Val Ser Asp Ser Ser Ala Phe Ser Lys Pro Gly Leu Pro Pro Gly 805 810 815 Gly Ser Pro Gln Leu Gly Cys Thr Thr Ser Gly Gly Leu Glu 820 825 830 3 29836 DNA Homo sapiens misc_feature (6464)..(8402) Can be any one of A, T, C and G 3 atgcttgggc cagggtccaa tcgcaggcgc cccacgcagg gggagcgagg cccagggtcc 60 cccggagagc ccatggagaa gtaccaggtg ccgagtgttc cctgcgggga ggcgggagct 120 ccgtggggta acggtcgcaa ccctggagct acggccggcg gttccgaccg agggcggcga 180 ggggcccgcg ccctggccag tgtcggcctg cagctcctag gttgaacccg gggggcctcc 240 aacggtgacc tcctgggtgc cctttgccac tcagtttccc cctttgtgaa ttgactaagg 300 attctccagc cctggctgag tatttgaggg cgtggggcag ctcctctatc cttcgtgcct 360 ggggtctgtg cgcttgggtc caccgaggca ggacccccgg gaacatcccg agtacataat 420 tgggagcccc cagtccccta aaaacacccc tgcagcgtgg gtctgtgaaa atgtttgaga 480 cctaaaaaat tcacaaaaca caaaaggaaa gctgcaaaat aaaagtaaat gtttaattaa 540 atgcttctat acatgatata tacactttat taaatgttag attcagcatt tgtgaaaaat 600 gcattcgctt ggaaacagtt tgcgggttag atttttgtca ctttggaaga attgtctttg 660 tgtgagagga ctatagggcg ttgccagagg tgaagcagat gagcttctgg tggccagata 720 atttttaaag taaagttgtt tttcagatta aaaaaaaata gacgtttcag gaatatacct 780 gcttttggaa aaaaaaatag acttgattcg agatacggct ccattttact gtttaatttg 840 ctgcctaagc ttgaacgctc tcacaccagc tctgccctca gcccgctgtg gcttagaaca 900 gcagtccctg gccgggctcg gtggctcacg cctgtaatcc ccaacacttt gggaggccaa 960 ggcgggcgga tcacctgaga tcgggagttc aagaccagcc tgaccaacat ggagaaactg 1020 tctctactaa aaatacaaaa ttagccaggt gtggtggcgc atgcctgtaa tcacagctac 1080 tcgggaggct gaggctggag aatcgcttga acccaggagg cagaggttgt ggtgagccaa 1140 gatagcgcca ttggactcca gcctgggcaa caagagcaaa actctgtctc agaaagaaaa 1200 aaaaaaatag cagtccccaa cctttttggc acaagggacc agttttgtgg aagacaattt 1260 ttccacagat ggaggcggga ggatggtttt gggatgattc aagcacatta cacttagtgt 1320 gcagttcatt tctattatta tgttgtaata cataatgaaa taattacaca actcaccata 1380 atgtagaatc agtgggagcc ctgagcttgt tttcttgcaa ctagaccatc ctctcgggat 1440 gatgggagac agtgacggat catcaggcat ttgtttctca taaggagcat gcaacctgga 1500 tccctcacat gcactgttca caatagggtt cacactccca tgagaatcta atgctgctgc 1560 tgagctgaca ggaggtggag cttgggtggt aatgcgagcc atggggagcg gctgtaaata 1620 cagatgaagc tttgctccac tgcctgctgc tcacctcctg ctgtgcagcc tggttcctaa 1680 caggccacgg actaggttgg ggacccctgg cttagaatat ccagtgtcat gagcaggctg 1740 ctcacaaggc tggattacag actcctaaga cttttatggg ctccgagagt ccctaggctc 1800 aggcttccat cctcatatct cctcctctgg gtcctgccct ccctccccca atcctctgat 1860 gaatgtcagc ctccagcaat ccccggccca gccccctgcc ccatagcact tggtctctgc 1920 acagagttct ggcttggtgg ccatctctcc agatttggct caaatcacag gctctaagat 1980 cagaccccca gagttacccc aggcagtgtc ctgctttcta gtgacatagc ctcaggcaag 2040 gacctagctc cttgtgcctc agttttcccc aatgtaaaca cagaggtagc aatggtgtca 2100 actgcaaagg gtggctgtga agtgcttggc accatgccag gcacacaatg gcttcctgat 2160 tgtaccagtc acaagattgg ttactttctt gttggaaacc agttgggagg tggatgctgg 2220 aagttgaggt cacagaggtc tatagagagt gaataagccc tttttctctg ggaggttctt 2280 gcacttgagt gcccagctgg tcctcattgc aggttggggg agggatacag ggttgtagaa 2340 gagctccaga atcggcccca aataggtgag atcagagttc tgccattgaa aggcttcttg 2400 ccctccttgg gccatttcct ctattgcaag atggggtgga gccacttgct ctgccagcct 2460 gacaggggag ttagcagggc cagaaaagga gttggagctg ggcttttgga aagggaaaag 2520 ttgtgtgcat ttcctgaaag cttctctctt ccttgctgat aggttttgta ccagctgaat 2580 cctggggcct tgggggtgaa cctggtggtg gaggaaatgg aaaccaaagt caagcatgtg 2640 ataaagcagg taagaggcca agcctgtgca tcccatgccg ggtggttctg tgactgtgat 2700 tttccccaat acaagctctt cccatgttgg agaagcttcc tgatgcggca gctggattcc 2760 tcgctgctga cacttgcgga gactaatctg gttggggtag atgtgggggt gcgtgaagct 2820 ctgtcacctt gatggggaag caatgctaat ttttactcca acaccaccac ctcccaccat 2880 ttacgcatca cgtgctgtat gccaggcact gactcacttc atctcccgtc aaccctgtaa 2940 agcagaaaca atgaccctgt ttatagccaa agacagtgag gctcaatgca gcgccagact 3000 aggcaaggtc acacaatttc caagaggatt tgaattcagg ccacctcact gggggacacc 3060 gtgctaccca gtgctgggtc accagtttta ccaaaaggga gccaggccca gagaggatgg 3120 ggactggctc aaggtcacac agggctaagg tcacacacca ggccctgagc ccttccacca 3180 cactcctcac cagggctagc agggcatggg gaggtgtagg cctgcaggaa gacagccctt 3240 tgtgtgtccg agacagggag gtccagatca acagagggac tagggtgaga aagctgctgc 3300 aagagtccct ggcatgccct cctctctgtg gtggtggcag ggacccagca ggttcagggc 3360 tggccataca gcgggaggag ccttgtcagc agctgctact gggccaggcc tcagtccgta 3420 cagctccgca gtctcaccct gtatggctgg gcctggagtc cttgcccctg ccctgctccc 3480 ttgctggctg gctgtggggt tggccccctt gtctcacaag ccactggggc agtgtggctg 3540 actgccctct gagcagttaa ggagcttttt tttttgtttg gagatggagt cttgctctgt 3600 cgccaaggct agagtgcagt ggtgtgatct cagctcactg caacctctgc ctcctgggtt 3660 caagcaattt tcctgcctca gcctcccaag tagctgggac tacaggcaca cgctgccacg 3720 cccggctaat tttttgtatt ttagtagaga cagggtttca ccgtgttgcc caggctggtc 3780 tcgaactcct ggaactcctg agctcaggca gtccgcccgc ctcggcctcc caaagtgctg 3840 ggattacagg catgagccac cgcgtccggc ctgaggagct tttaaaaatg tcagccatga 3900 ctaggcatgg tggctcatcc ctgtaatcgc agcattttgg gaggccgagg caggcagatc 3960 ccttgaggtc agaagtttga gagcagcctg gccaacatgg tgaaacccca tctctactaa 4020 aaatacaaaa attagctgag catggtggtg ggtgcctata gtcccagcta cttgggagct 4080 gacgcgggag aattgcttga acccgggcgg cggaggttgc agtgagtcga gattgcgcca 4140 ctgcactcca gcctgggtga cacagcgaga ctctgtctaa attaattaat taattaatta 4200 aattaaaaat aaaaatatca cccaattatt tctaaataaa aattggggaa agagagtgta 4260 ggtaggagtt tatgggttct tccagtcttt tttcctaagc gtttgtaaac ttttttggat 4320 tcaggagaat ggtatcgtta aagttgatag catccttttt atattgcaaa catagtttca 4380 tgtcattccc acagcctcct cctctcttgg ctctggcaac tgtgtggcct cccgtcttgc 4440 ttgatgtgct gtaactaacc caagccatcc ctatcacatg ggctgagcat tgagcttgtt 4500 ctcaattttt cactttgtgt aaacagctct gataaagatg cttatggcat tatggttttg 4560 tttgtttttt tgttttgttt tgttttgttt tgtttagcat ccatgatttc cttaggaaaa 4620 attcctagga gtgagatttc tgggtcatac gatgtaactt tttttttttt tttttttttt 4680 ttgagatgga gttttgctct tgttgcccag gctggagtac agtggtgtga tctcagctca 4740 ctgtaacctc tgcctcccgg attcaagcga ttttcctgcc tcagccttcc tgagtggctg 4800 ggattacagg cacgtgccac cacatccagc taattttgta tttttagtag acggggtttc 4860 tccatcaaca tggagaggat ggtcaggctg gtctcgaact cccgacctca gatgatccgc 4920 ctgcctcggc ctcccaaagt gctgggatta caggcgtgag ccaccccacc cagccgcttt 4980 tttttttttt ttttgagacg gagtctcact cttgttgccc aggctggagt acaatggcgt 5040 gatctcggca cactgcaacc cccttctccc aggttcaagt gattctcctg cctcagcctc 5100 cgaagtagct gggattacag gcatgtgcca ccacgcccgg ctaattttgt atttttagta 5160 gagacagggt ttctccatat tggtcaggct ggtctcgaac tcccaacctc aggtgatccg 5220 cctgcctcgg cctccctaag tgctgggatt tcaggcgtga gccactgtgc ccggccacaa 5280 tgtaacattt tcaagtcttt ccatgttttg gccaacttca cctcctcata tgccccctag 5340 gacaggagga aaggaagaca ggaaggctca ctcagtgtct ttgccctgga ttccacggga 5400 cagtgccact ggcatctcag gtctctccat agatctggga acaattcact aactttacat 5460 gatggtctgc attcacccca ttataagagt acttcattca taagtctttt gagcaaaatt 5520 ctgggtgagg atctggtatt agagccagtg gtagtatata cctagggcct gtgccaccaa 5580 gcgtgctgca gactcaaagc tccgtgctgc ccttgccacc acccttccct ttccatgccc 5640 tccccacctc cacccggaga gggcacagga gagaagagca ctgtacattc catgcgtgga 5700 gacaaccttc cccatgtggg taaggaatga agtggtgaga ttgatgcttt cccaaccaga 5760 acaagatgtt cctgtttaaa gacggtctga aaatggatcc tttactgagt tcttggagcg 5820 tatattatgc tgtcctaaac ttatctttgc aaaaggagca aagatgttct cattgctcta 5880 agtattttta gatctctgcc ttaggaatat cttccatttg tgccatatgg tgggggcagg 5940 aatggtggga gcctgtcact cctgctaaat agtgatgatg gtggtgatga tggaggtggc 6000 aatgatggtg gtagtggtgg tgatggtggt ggcagtgatg atggtgatga tgatggtgat 6060 gataatggtg atggtggtga tggtgtgatg atggtgatgg tggtcatggt gatggtgtga 6120 tgatggtgat ggtggtcatg gtgattatgg tcgtggtgat gatgtgatga tggtggtggt 6180 ggtgtgatgg tgatgatagt aatggtgatg gtggtgatgg tgatgataat agtggggatg 6240 gtgacagtgg tgttgatggt gtgatggtgg taatgatgat ggtaatgatg gtgatggtgg 6300 tgatgatggt aatgatggtg atgatagtga cgatggtgac agtggcattg atggtttgat 6360 ggtggtgatg gtgtgatgat ggtggtggtg tgatggtgat ggtggtagtg atggtgatga 6420 tgttggtgat ggtggtgatg atggggataa tagtggtggt ggtnnnnnnn nnnnnnnnnn 6480 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 6540 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 6600 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 6660 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 6720 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 6780 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 6840 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 6900 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 6960 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7020 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7080 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7140 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7200 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7260 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7320 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7380 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7440 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7500 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7560 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7620 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7680 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7740 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7800 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7860 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7920 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 7980 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 8040 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 8100 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 8160 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 8220 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 8280 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 8340 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 8400 nntcaatgat ggtaatgatg gtgatgatat atgatgatga tggtaatgat ggtgatgata 8460 gtggtgatgg tggtggtgat ggagtgatgg tggtaatggt agtggtggtg gtgtgatgat 8520 ggtgatgatg gcagtgatgg tgatgatggt gatggtgatg atggtgacag tgatggtggt 8580 gttggttgtg gttgtggtaa tgatggtgat ggttgtggtg gaggtggtgg ctgtgcagat 8640 gatggtaatg gtcgttgtgg tggtggtgat ggcggtgata acggagatga tttgctacat 8700 gtttattaag ctcatgctct tgtgcccttg caggtggaat gcatggatga ccattacgcc 8760 agtcaggccc tggaggaggt aactctcagg gtagttttcc ctctggaaga gctcaatgga 8820 gcatacacag actgtgttct gtaccttctt gttgagtgcc tggatgaaga gaaggctgga 8880 gggagggata gagcatcagc accagttttg cctcagctgt gaagccagca gccccaggtc 8940 atgaagggag tccatgcccc aaacactcac tgctaaatgc aggtgccgac aacttagaac 9000 atatgttccc agagaacata aaatttaaat attgggctgg gcacggtggc tcacatttgt 9060 aatcccacca ctttgggagg ccgaggcggg tggatcactt gaggtcagga gttcaagacc 9120 agcctagcca acatggggaa accctgtctc aaccaaaaat acaaaaaaat tagctgggca 9180 tggtggtggg cacctgtaat cccagctact tcgagaggtt gaggcgggag aatcacttga 9240 acctgggagg cggagattgc aatgagccga gattgcacca ttgcattcca ccctgggtga 9300 cagagcaaaa ctttgtctca aaaataaata aatattggcc gggtccctag gttaatctac 9360 caataacttc tttaaatatt ttttcttcaa tataaaatta ttaattacag cagaaaattt 9420 taaaaataca gaattgggtt ttcttttgtt tttacctttt tttttttttc tttaatagca 9480 atggggtctc accattttgc ccaggctagt ctgaatttct gggctcaagc aatcgtccca 9540 cctcggcctc caaagtgctg gggtacaggc atgtaccacc acacccatac cagaactgtt 9600 taacaaaaca aatataaatt acccttaatt ctaccattga gagctgccca gcagtaacat 9660 actcgtttac aataatagcg ataacagctg cacttcggtt tgattggcaa agcctccgag 9720 tagcctttct ttgtgtttct tgaatttccc acgagcttga aggtgtcccc tcttatctca 9780 tggtcacttg tttgtcttct ggaattgctg gctctggagc ttggccagca aacattattg 9840 ggtgtatagt gtgaccgaag cacagtggtg ggcccaggct actcggtaac aaatgggaag 9900 aaagagcatg gggcctgccc agagccgcac gccgccgtgg cttttcacgt tgccgattcc 9960 catccacagc ccactaggta ggccacctgc tttcatgccc aaccttccac cccaagcagt 10020 tgtctgtctg gctccatgtc tctaaggcag ccctatctgc tcctgtttgg gcatgtttca 10080 aagcactttc cgccagggca ggggcaccgg gacctctccg tgtgcccacc tcccggtgtc 10140 tgcaccacct cctccagcca gccctggctg tggctgatcc ccaccttcct ggcactgcct 10200 gccacagctc actcaccccc catggtatcc ctgtgaggca gattccacta ggacccccat 10260 tttcagatga ctatatgagg cccagtcacc cagcacagcc agctcatgct ggagacagga 10320 cccacatcgg actgcctggc tcccaaccac cgtgcagctt ccctgagccc actccctggc 10380 ccagctcaga gcccgaggct tgcatgtttg ttgggatgtg tgacagagaa gcccgagctg 10440 agaaaggcgt ggagaggcac tgacttctcc gtttcctctg ctctatcctg gcagctgatg 10500 ccactgctga agctgcggca cgcccacatc tctgtgtacc aggagctgtt catcacgtgg 10560 aatggggagg tgggtcagag ctgacaccta cgggctcagc cgccacgcag tgggctgcag 10620 gaccaagcag actgagccca gagcacgccc accccccact gtcagaatag ctcgtgtggc 10680 aatggcagtg actgtaaacg tggccacccc tgacctaaca ctcactgggg ccaggtacca 10740 tgctgggggc tttcggtgca tagtctcata ggagccccaa aaacctcatc gtcaggagag 10800 tttttttttt ttttggacaa agtctcgctc ttgtccccca ggctggagtg taatggtgtg 10860 atcttggctc actgcaacct ctgcctcccg ggttcaagag attttcttgc ctcagcctcc 10920 cgagtagcca ggattacagg cgcgtgccat gatgcccggc taatttttgt atttttagta 10980 gagacggggt ttcaccatgt tggctgggct ggtgttgaac tcctgacctt aagtgatccg 11040 cccgcctcgg cctcccaaag tgctgggatt agaggcgtga gccaccacgc ctgaccagga 11100 ggggttctta acccatttga cagaagaaga aacagaggct aacagaaaca agttgctcca 11160 ggttacccag ctagtacgtg gccaagccag agggccaggc cagatgggcc tgactcccgg 11220 gctctgcacg cagccaacga gcttgtcctg gtgagcctgt gcctctgatg acagagtttt 11280 tactttcatg gaaggagcta gttgacctcc gtctccacag ccacccgaca ccggtgccgt 11340 cctgggcctg tgcgcccctt actcctgcag cccctgtgca gcttactgac cagcaccaca 11400 tccgtcatca cgtgccagga gggctgcagg gcaggaagta ctgtccctgt gctcctgatg 11460 gggcctaggg ctccggttca ggagctctcc aaggccacac agctagtaaa cagctgactg 11520 gggacggaga ctcaggtcca gcagccgggt cctgtaggct ggatgacttc ttgtctttac 11580 aaggagccag gagctttcca gtcacttctg atgggactga ggcagacagc gggaggctga 11640 gcaggcacaa ggggtgctgg aggtaaagag aggctgagaa gccttctgcc aggcgccagc 11700 ctgcatgaga tgtccacact ggtgttccca cctggggcca acaatcccgg gtccgagcag 11760 gaaaggcccc tcacacgctc cctctgctcc agcgggtctt ggggagggga cctcactgca 11820 tgcactccca aagattttct gagcaccccc taatgctctc aaggagggtg cagagaagcc 11880 aagaggacat ttccctatgg gagggagcac agaattgggg gccctgacct ggtctgctgg 11940 ggctctggcc agggcagggt cctcagagga aatgggtctg agctgagccc taaagggtgc 12000 atagttatcc ctatgaaggg ggtgggtagt ggtccaagca gaggggtcca tgtgtgcagt 12060 ggccagggga caagtgcagc ttttgggaaa gtccaagtag cttggtgtgg atggagtgtg 12120 gagtggaggt ggagcctggg aatggggaga ggagagagag gaagctgaag gtggggcagg 12180 aggggcttgt agccccctcc agtgggcagt gctcccacgg ccactgcaaa tggcagctga 12240 gcgcagggag gccctggagc aggtgagcct gcagaagcac cggggcctgg gcgtcctttg 12300 gctaagggcc tcctgtccca gcagatctct tctctgtacc tctgcctggt gatggagttc 12360 aatgagctca gcttccagga ggtcattgag gataagagga aggcaaagaa aatcattgac 12420 tctgaggtga ggtcctttgg ggcaccaggc ctgggggcca cctagacctg tgacacaggc 12480 cctgcggtgc agggcaaagt aacagcggga gggcaggcac catggagtcc agccttgttt 12540 ttttcttaaa tgtgtgcctc gaggcattgc actctaggta atgtgtgcag atcttaagtg 12600 cacagtttga tgcactcaca caacttccac ccagatcaag acagaaggcg ttcctaacac 12660 tagaaggttc ccagtcggtg accatgattc cagattgttc tgccggtccc tgaagttcct 12720 gtaaatggac tcgtccggca tgctgccact cctgtctggt ctccttccct cagcctgctg 12780 ttgtgagccc cgcggtgctg ctgcatgcac cagcaaatca tgtgttcatt gcttgctgcc 12840 actctgctgc ctgattgcgc tgcaggctgt ttacctagtc tcatttgggc tgcttccagt 12900 ttggggctat tgtgaataag gctgctatga gcattgctgg aagacactca cttttctggg 12960 atgaatacct aggagtggaa ttattgggtc gtagagtaca tgtgtgtagc ttcagtggat 13020 gctccaaaca gatttccgac ttggtttggt tggccccatg tttactctca caagttgtga 13080 gcattcccga tccacatgga ggccagcact tcattgtgtc agtcttgttg ttgttgttgt 13140 tgttgagatg gagtctcact ctgtcgccca ggctggagtg cagtggcacg accttggctc 13200 actgcaacct tcgcctccct gttcaagcga ttctcctgcc tcagcctccc aagtagctgg 13260 gactacaggt gcccaccacc acacccacta atttttgtat tattaataga gacaaggttt 13320 tgctatgttg cccaggctgg tctcgaactc ctgacctcaa gtgatccacc cgcctcggcc 13380 tcccagagtg ctgggattac agatgtgagc cgccgcgcct agccagttca ctttcttaat 13440 gatgtctttt gatgatggaa agtcctaact gtaatggagt tcgctttccc aatgctgact 13500 cttatggtta gtgcctttgg agtttaagaa gcatttcctg ctccaagatc atgaagatac 13560 tctcctctgt cttatggaag cttggttatt tttgccttca catttagatc tttcatctac 13620 cccagatgaa tgctacctgc ttttaccctg agaactgtgt ttggggggac catgtacccc 13680 tgaggggctc ttcggggcac acagctcttc tcttaccatg ggcctcagag gcaggcccgg 13740 agtgagtttc agactttgtg agtgaagccc ttcaaaacac gaaatattcc cagaaaccca 13800 gtaagtgcag cagacctact ctaactgggg gcagtgggag gacgcccaca tcctgccccc 13860 tcagcccctc tctgacaccc cagggtggcc ctgaatccag gggccctagg agcccagctt 13920 tagaatcacc gcgctgggta ctcgatggag cttgtctctg atgcagaaca ctcctagcat 13980 tctctctcag ggctcttttc atttgaatga cctagaggat tgagctcatg taggcactga 14040 aggcttccac ctctcccata cccgcaaggc cgatctgcct tcagctccca gcaagtgtgg 14100 ggcagcgcgg gccacagagt agggtgcagg gatggggccc ctgcagcacc cagggtctct 14160 ggtatggaga cagcagtgtg gagtctggaa actcagagtc cttctggctg ccgccgcggc 14220 tttaccatct ggagagccac cacgctgaag cctcctccac cctgagcgct tggctggctt 14280 caggcctgtc tcaagatgca aggagaggat acaccaccat cctgctggct gctctgagtg 14340 tcacccccct gaaagcagca cagggtgccc ctcccatcct ggcaccccct acttctcccc 14400 cagtggatgc agaatgtgct gggccaggtg ctggacgcgc tggaatacct gcaccatttg 14460 gacatcatcc acaggtaagt ggggcccctg acctctgcgg actggctggc tgcttcggga 14520 gaaaaggcac tgaggccact cgggtgccag tgcccgtggg caggatctgg ggagaaaggt 14580 gcaccgggcc agtgcagcca ggataggatg ggaccttaca gagctcctcc cgggcttgaa 14640 agaggctctt ccaagtggtc tcaagccatg tgcacacgca cagctgcatg gggtgtgcgc 14700 tagccaggcg ggctgctcta gagttcgtgg aaaggaagga ggcaaaagcc ctgccaagaa 14760 gagagaccgg gttgcctgcc gtggggccag tgtgggctga gtgggccctg ctgagccttt 14820 gacccccagc ggcacaactt tcaggctgga gaatccatgg tctgaagggg ctgggagatg 14880 gctccaattc tgaacaccaa tatcttattt aaagaggaag agggaaacta tgcagctggg 14940 cgtggtggtg cacgcctgtg gtcctagcaa catggaggct gagatgggag gattgcttga 15000 agccaggagc ttgaggctgc agtgagctat gatcgtgcca ctgcacttca gcctgggcaa 15060 ccctgactta aaacacacac acacacacac acacacacac acacacacac acacacacca 15120 cgcagaccat acgtacaaag ggaatgctca cattccacat ccagtgttca tgtcactaga 15180 cgtcgcacaa tggccaaaaa tcaatcccca ggcaaacgtg tagctgagat atctaaggag 15240 gtgaatgtgt caattaaggg gcctcttcgg agtgctgggt gtgttcctac ttgtggttga 15300 ggatttttcc cccgatttaa aataattgag tatatttctg tacataggaa acaactgctt 15360 aaaaagtagg ctgagatggg gcattttgtg gaggagagga ggatgtgggc tgctgctgca 15420 gaaccaggtg gggcagggag cagagagtca ggctcagcac acacactggt cccacctggg 15480 gttgtgggtg gtggctgccc aggtggcccc ttggcatcca gaggcaaacc cacctcttgg 15540 tttcaggaat ctcaaaccct ccaacatcat cctcatcagc agtgaccact gcaaactgca 15600 ggacctgagt tccaatgtgc taatgacaga caaagccaaa tggaatattc gtgcggagga 15660 aggtggcagg ggctccccca ggttgtggga gagggggttg gcgcctagaa tccaggcggc 15720 gttggccact ctgggtgctg gagtgaggca acatcaaaca gctgtttgct cagaaggtcc 15780 ccacaaagcc ctggccttgt gtaaactcca aagagacctc ctttgggttg caactgagca 15840 ggcgtgccac caccagggca gaggcagggc cccacagaca cccaacattt gagagaaaca 15900 aagtcgtggt tgtttgtggt accccagaaa atgttgcctc tcatggaggg aaaagaaagt 15960 gtcagaagga aggatatgaa aatgcccagg acggagggag gtgggggggg tcagcccccc 16020 gcccggccag ccgccccgtc cgggagggag gtggggggct cagccccccc gcccagacag 16080 ccgccctgtc cgggagggag gtgggggggt cagccccccg cccggccagc cgccccgtca 16140 gggagggagg tgaggggcgc ctctgcccgg ccgcccctac tgggaagtga ggagcccctc 16200 tgcccggccg ccaccccgtc tgggaggtgt gcccagcagc tcattgagaa cgggccatga 16260 tggcaatggc ggttttgtgg aatagaaaag ggggaaaggt ggggaaaaga ttgagaaatc 16320 ggatggttgc tgtgtctgtg tagaaagaag tagacatggg agacttttca ttttgttctg 16380 tactaagaaa aattcttctg ccttgggatc ctgttgatct atgaccttac ccccaaccct 16440 gtgctctctg aaacatgtgc tgtgtccact cagggttaaa tggattaggg cggtgcaaga 16500 tgtgctttgt ttaacagatg cttgaaggca gcatgctcgt taagagtcat caccactccc 16560 taatctcaag tacccaggga cacaaacact ctgcctagga aaaccagaga cctttgttca 16620 cttgtttatc tgctgacctt ccctctacta ttgtcctatg accctgccaa atccccctct 16680 gcgagaaaca cccaagaatg atcaattaaa aaaaaaaaaa agaaagaaaa tgcccaggac 16740 ggagggtctg tgggtgccag gcactggctg cgtgtacatc actgagtcct acaacaaccc 16800 aggagatgaa ggggtgggtg gcaaggggag acgagttctc gttcctttga aaagatggcc 16860 agagaaaggg ggctggagag atcaaccaca gaggaggagt ccagagtccc aggatggcag 16920 ttgctggttg cactctgtcc tttttttttt tttttttttt gaggcggagt ctcgctctgt 16980 cgcccaggct ggaatgcagt agcgcaatct cggctcactg caagctccgc ctaccgggtt 17040 cacgccattc tcctgcctca gcctcccgag tagctgggac tacaggcgcc tgccactggg 17100 cccagctaat tttttgtatt ttttttagta gagaccgggt ttcaccatgg tctcgatctc 17160 ctgacctcat gatctgccca ccttggcctc ccaaagtgct gggattacag gcgtgaacca 17220 ccgcacccgg ccacactcag tccttggtag acagaagatg aatgagtaga tgggtgggtg 17280 tgtggtttgg tgggtggtag gatggatagg tgggtgggta agtggatgga tgatgggtgg 17340 gtgagtggat ggatggatag gtgggtggat agatggatga ataggtgggt gggtgggtga 17400 gtggatggat ggatggatga gtggatggat gaatggatgg atggatgagt ggatggatgg 17460 atggatgggt ggatggatgg atggatggat ggatggatgg atggatggat ggataggtgg 17520 gtgggtgagt ggatgggtgg gtgggtgagt ggatgggtga gtaggtgagt ggatgagtgg 17580 atggatggat gagtggatgg atggatggat ggatggatgg agatggatgc atgcatgcat 17640 ggttggccgg atggatgaat gggagggtag gtaagtggat gggtgggcgg gtggatggat 17700 aggtgggtgg gtcagtggat ggatagatgg gtgggtgagt ggatggatag gtgggtgggt 17760 gggtgggtca gtggatatat ggatggatag atgggtaggt gagtagatgg atggatgggt 17820 gtgtggttag agggatgggt gtgtgggtgg atgggtgagt gcatgggttg tggatggatg 17880 gttgggtggg tagatggatg ggtgggtggg tgcatgtgga tggatgtgtg ggtgggtagg 17940 tgtatggatg aatggatgca tgggtgagtg tgtgggtaga tgggtgggta tgtggatgga 18000 tgggtgggtg agtgagtgaa tgggtgagtg agtgaatgag tgtgtaggtg ggtgagtgga 18060 tgggtgggtg ggtggatgga tggatggatg gatggggtgt gcgtggatgg atgggtggac 18120 agacgggcag atggttggtt ctattggagg tgtagatggc atgcgtcctt ggagtccagc 18180 cctttactgt tgggctgggg aatggaggtc cagagaagga ggggctgcct gaagccaacc 18240 agggactgat ggactcagag gagtctgctc ttttgcctcc ctgtctgggg ttccagttga 18300 gaaagtaggg cagagcaact gtaactttgc ccccaaggtc ctgacattta gaaggggcaa 18360 gaagtttaga ggggtgcaca gtttcttggc acgtgcctct tccaactcct tctacagcca 18420 tccagggcac acagacacac cacctatatg ggccagcctg gtgggcaccc accaagatgg 18480 acagcttcag tggctccaga tcaacacaaa gctcccgctg attggggcct cttcctcccc 18540 acagttaata ttctccacct cttctgagaa gaggacctgc agggcttgtg tttcaagctg 18600 cttgcggggg gccaccaaag gggatacagt gctgggcagg gtgactctgt caagcccctg 18660 cccccaggga gcaaaggact cagggatccc accttgcttt taccaacaga cccctttcgt 18720 aagtcctgga tggcccctga agccctcaac ttctccttca gccagaaatc agacatctgg 18780 tccctgggct gcatcattct ggacatgacc agctgctcct tcatggatgt gagccgccct 18840 ccctccccca caccccacat gctgttcccc acgcgcccag gcctggggaa aaggcttggc 18900 ctcaccctgc ctcccctctg catcccttcc cctggctctc tgcaggctgc acagagccct 18960 cttctccacc tgcgaggggc ctgccctcct cagaacccct cagcttgcag cacctgctgg 19020 gctctagcag gataatgaca gcagtggtaa tattcagacc atcccacgcg accctcgcag 19080 cagccctcca ggtggtgtca ctgactcttg atggagaaaa gccaagttca ggtgcccttg 19140 tgagcatgaa ggctgcacgg agttgcaagc aacgggaacc cagtgtgggc ctgaacacac 19200 ctggctgtct catgcacaag ccccaggctg gtgtggaggt gccttctctc ctcctgcaca 19260 tccttagcat gcagctcttt ctctcatccc tgctggggcc cctgcaccat ggccacagcc 19320 tgtgggcagg aaggagggga ggcagagggc cccactggcc ccgcgagcac tccaaggtca 19380 ctctggctgc agggaggcag ggaagtccag cctgtcgctt cctatcctct atatgcagaa 19440 gagaaaagtg gggaaggcct gccatgccca aaacaaggaa gctccccttc tccgcagcac 19500 cacctgcagg caccgaggtc cccagaaagg acagacacct ggctggaccc aggttcccca 19560 tggtctccca gacccccaga ctccacctct gagaagcacc ttgccactcc cttcctttga 19620 aagactccca gggaaatgag agccttccca cttcggaggc tgtgtgacat cctggaaatt 19680 agcctgagct ccagccccag cccaggcagt gtgaccctgg gcatgctcac actctgtgaa 19740 atgggcatgc tgtcttactg gctgggtcta gatcaggggg ctttcttggc aggactccac 19800 cccgggagac aacccgctgg cttctctgaa actccatttc ttcttatgga agagcttggg 19860 gcccctgggg tctctgggca ttcttgtaga cggtggccac acctggctct ccctggtctc 19920 ctcctggatt tcttggtccc tggtcgtccc ctgcccatgc tgggacctag ttttcattta 19980 cttaagggaa tacacagagc tgtcctctct ccgtgcaggg cacagaagcc atgcatctgc 20040 ggaagtccct ccgccagagc ccaggcagcc tgaaggccgt cctgaagaca atggaggaga 20100 agcagatccc ggatgtggaa accttcagga atcttctgcc cttgatgctc cagatcgacc 20160 cctcggatcg aataacgata aagtgagctc agggtcgggg tttattttaa cctgtggatt 20220 tatctttcaa catctctcca ccctaataca agcacagcta gttggctttg taacgcctca 20280 aagaactcca tcacagatgc cctgattatc cctgcacagc tgggctttgc ccagttctgg 20340 ctctcccaaa ccgtgctgcg gcgagtaatc ccgaatgtac ggtggagtga gcagactgac 20400 ccccaggagg cacaggaggc gtagccccca ggacccacga cacttttagg gttccagaaa 20460 aaagttttca ttctacataa aaaaaaaaat tcctaaagac aatggtcacc tttaaatttt 20520 tcattctaac ttactttaaa atcagaagac aaaagtaaat acataacact ggccggggcg 20580 gtggctcatg cctataatcc cagcactttg gggggctgag gcgggtagat cacttgagct 20640 caggagttct agtctagcct gggtaacatg gcgaaacccc tgtctctacg aaaaatacaa 20700 aaaattagct gggtgtagtg gtgcatgcct gtgttcccag ctactgggga ggctgaggca 20760 ggaggatcgc tcgagcccgg gaggcagagg ttgttgcagt gagctgagat ctcgccactg 20820 cacttcagcc tgggtgacag agtgagaccc tgtctcaaaa acaaaacaaa aacatataac 20880 aaagaatcca ggccggacac ggtggttcac acctgtaatc ccagcagttt gggaggctga 20940 ggtgggtgga tcacttgaag tcaggtgttc gagaccagcc tggccaacag agcgaaaccc 21000 cgtctctact aaaaatagaa aaaaaattag ctgggcatgg tggggtgcgc ctgtagtccc 21060 agctactcag gaggctgaga cagaagaaat gctggaaccc gggaggtgga ggttgcagtg 21120 agccgagatt gtgccactgc actccagcct aggtgacaag agtgaaactc catctaaaaa 21180 aaaacccaaa caaaacaaaa caaaaaaccc aacatatagc aaggaatcca gcctgggtca 21240 tattcatctt tataccaacg cagttgtaaa atctgggttt tcatgtttct atggaggcag 21300 gggacaagag caaaagtgcc agggccccgg actgtccccc agctctgtga gctgaggccc 21360 tgcctccatg gagtacgtcc ctgggtgtgg aattgctggg gtgcttgccg gacacactgg 21420 ggacactatg agagaccctg ccaaatgaat cccaaaacag ggagttcagt gtccagtgtc 21480 cgcaccaatg ggcagcccgg agccagaggc agagggagag ccccacgggg aggtggcagg 21540 gggcgctgct gggttactca gccctctctg ctcctctgct agggacgtgg tgcacatcac 21600 cttcttgaga ggctccttca agtcctcgtg cgtctctctg accctgcacc ggcagatggt 21660 gcctgcgtcc atcaccgaca tgctgttaga aggcaacgtg gccagcattt taggtgatgc 21720 tggggacaca aagggggagc gtgccctgaa gctcctgtcc atggccttgg catcctattg 21780 tttagttcca gagggttcat tatttatgcc cctggccttg ctccacatgc acgaccagtg 21840 ggtaggaaca gtttccctcc atccatccct acacactgcc caggacaccg ctctgctcaa 21900 atatcctcca tagctcccaa ctacctataa cacaaagttc ctctccatag cctggccccc 21960 acctgttttc cctccgctcc tgctacacaa atcctctatg tagctcaatg gcctagtcac 22020 tgcccacgcc tcccacaacc ctctgctttt gcttccacag cctgggtggc acacagtggt 22080 cccaggacat cttcctcaca cagcaccact tccttcctgc gtgcttctat gtgcccagga 22140 tgagcagaag tgcgctccat catctgtgtg ctccaagaca gagacttggg ttctattaag 22200 gaaaagttgc ttggtctcag tggcctcatc tgtaaagtgg ggatggtaac agcccctccc 22260 tctcatcctg aacctgtgga tctgaggagg ggatgcacac acagcagcca gcccagtgtg 22320 gtgccgagaa acagagcccc gaggccctgg tcctcagaaa ggtccctccc ctgccttcct 22380 gtccctgcag aggtcatgca gaaattctct ggctggcccg aagtccagct cagggccatg 22440 aagaggcttc tgaaaatgcc tgcagatcag ctaggtaggc cccaccctgc acccctttcc 22500 cagctgctcc cctaggggca gaagctatgg tccggcctgt ggggagctga ggctggccct 22560 caccccgggc tctcctcgcc agtgctttat tgcagcgtgg aggcgtgcat gtgtccccag 22620 aagagtcccg tgtctctgct atctgcctgg ggaagacagc agagaagggg aatgggtggt 22680 gtggcagccc tcacatgatt ttaatggagc cacagacatc ccatcttccc cactgtccct 22740 atgaggggta tctgagttgt ttctcagttt ccactattat gaatgatact agaacggaca 22800 ccctggtgtg tatgtatctg tgcacttgtt tccgtagcac agattcctag atgttcaaga 22860 gtgtgaatac tttaactttt cacagataca acttgcccac ctattaagaa tgcatggcct 22920 ggcgcagtag ctcacgcctg taatcctagc accttgagaa gccaaggcgg gaggactgct 22980 tgagcccagg gatttgagac cagcctgggc aacaaaggga gagcccattt ctacaaaaaa 23040 taaaaaaatt agccaggtgt ggtgacacat gtctgttatc ctagctactc aggaggctga 23100 ggcaggagga ttgcttgagc ccagggaatt gaggctatag tgagctacgc ttgcaccacc 23160 gcactccagc ctagaagacc ctgtctcaga aaacaaaacc aaacccaaaa agattgttac 23220 tgctcattca tggagagtgt tgggaaaagc agtttttttt tttgtttttg tttgtttgtt 23280 tgtttgtttt tgagacaggg tctcgctctg tcccccaggc tggagtgcag tggtgcgatc 23340 ttggctcact gcaacctccg cctcctgggt tcaagtgatt ctcctgcttc agcctcccaa 23400 gtagctggga ctacaggtgt gtgccaccac acccagctaa tttttcgtat ttttattaga 23460 gagggggttt caccatgttg gccaggctgg tctcaaacgc ctgatctcaa gtgatctgcc 23520 tgtcttggct ttccaaagtg ttgggattac aggcgtgaga caccgtgctc ggccaatttt 23580 taaaacattt gtgccaaaac atgctttcat aaaatctttc cattcaacct ttttcacctg 23640 cctgaacatt accttcacat atccatccat ccacccatcc acccatccat ccgtctgtct 23700 atccatcaga cctggattag gaatccactg aggtttgttg cagtggctcg ggcctcagag 23760 gtgacaaggc ccagccctgg cctttgagta ggtagcagag gcctcatatg ggcctaattt 23820 accattccct ccctcccctc ctcctcttcg accccttttg tagctcagct gtgaccagga 23880 cagagtccct gggaagagag actttgcctc cctggggaaa ctagggaagc tgttgggccc 23940 catcccaaag ggtaggtctt tcccaccacc cggagccaca cctccctcca cgccttgctt 24000 agaaatgggc ttgcagccca gcgcagtggc tcatgcctgt aatcccagca ctttgggagg 24060 ggctgaggtg ggcagatcac ttgagatcag gagttcaaga ccagcctggc cagacatggt 24120 gaaaccctgt ctctactaaa aatacaaaaa ttagccagac gtggtggcgc atgcctgtaa 24180 tctcagctac gcaggaggct gaggcaggag aatcgcttga acccaggagg cggaggttgc 24240 agtgagctga gatgatgcca ctgcacaacg gcctaggcga cagagtgaga ctctgtttca 24300 aaaaaaaaaa aaaagagggg ggggtcttgc ttcgctccac actccaggtg ccaggacttc 24360 atccttgttg ctctcatgag cctagagtgg agggatggct gcctggccac tgcccctcac 24420 ccagtcccca gcccacaaca gtttctggca cagtggcagg gtggatggag cccacccacc 24480 catgtccacc ctcagggcag ttgcagccaa gggctctgga atagactggc taggttcaaa 24540 ctgctgaaga gcaggtgctt tcatcctgct gaccccaggt tcctcatctg catatggagg 24600 gcagccttgg gaggggccac ttcacagggc tgtgggcagc acagagcagg acacccgtgg 24660 cagacatggc atgcactcca tggacctagc gctaatcctc attgtccttc ccccttctat 24720 tcacccacct agggccctgc aggctcctac cagcctctgg gggccctggt cgggtctata 24780 tgcccccgat ctggcccaaa atgagtctcc cctgtgccgc ccgccctgcc aggtctgccg 24840 tggcccccgg agctggtgga ggtggtggtc acgaccatgg agctacatga cagggtcctc 24900 gatgtccagc tgtgtgcctg ctccctgctg ctgcacctcc tgggccaagg tgggtgccaa 24960 accaggccag atggggtcgg ggaggctgtg cgctgcttcc tgcagctgtg cctcctgggc 25020 caaggtgggc accgggccgg gtgggttagg ggaagccatg ccctgctccc tgctgctgca 25080 ccttctaggc catcttctag gccaaggtgg atgccagggc caggccagga gacactcctg 25140 gtggcctagc tctgccccca ccacctggtt ggcatctaac cactggagag tccatgccat 25200 cctgtgccca tcagacccca tcctggatgg cagagagggc acaggccagg agcttggaga 25260 cgcggatccc acccaggcct gccttttgcc tgctccgtgg ccctggacaa gttcctgatg 25320 atcctgccag ttttcccagc tatgaagcga ggagctggac acgaggtcct ctggagtgac 25380 cctcagggag gatgggttgt gtcctctgaa gagggctggt aggagggcag tgctgagttc 25440 atttcactgt cctgatggaa gaggttggag ctgagagatt gagcctccta tgagagacat 25500 gggttgttaa aagagttgaa ttagctttga tgattttttt tgaaacaaaa agtatttagt 25560 tacttttttt tttttttttg agatggagtt ttgctcgtca cccaggcgag tgcagtggcg 25620 cactctcggc tcaatgcaac ctccacctcc caggttcaag cgattctcct gcctcagcct 25680 cctgaatagc tgggactaca ggcacccacc accacgcctg gctaattttt gtatttttag 25740 tagagacggg tttcaccatg ttagtcaggc tggtcttgaa ctcctgacat cgtgatccac 25800 ccgcctcagc ctcccaaagt gctgggatta taggcatgag ccaccgcgcg cggccacctt 25860 tctagtttca ctgttggaag tttggagttc catgcaatgt tgaaattgtg ttcagtgctg 25920 cctgactggc tcccagggac caggatgcgt ggcctggccg ggcagggctc ccttccggtc 25980 cttcactcca ttaggccaca gggattcatg gaggcctgct ctgggtcaga acaaggcaga 26040 cctcggtttc cttcatgcaa agtggagatg ctatccccca gcctgtgagc cttgtgtgtc 26100 tggccccatg cctgagctgt ggggctaacc ccaggcgtct tcctctggct tgagcagcgc 26160 tggtgcacca cccggaagcc aaggctccct gcaaccaagc catcacctcc accctgctga 26220 gtgctcttca gagccacccc gaggaggagc cacttcttgt catggtctac agcctgctag 26280 ccatcaccac aacccagggt gtgtctgcca gccacctcct gccccaccca cgctccagga 26340 cagcccttcc caggggtctt ggaagggttg gtttggggta taggtgggtt ggacaggaca 26400 gtgctgggcc tcctcctgag atacatggtg gcatttggcc gtcttcattt ggccacccca 26460 aatgctggtc gcatcctttt ccatcttgat gacaagcttc cactcttgaa gtcactggtt 26520 ccctctacag acatgctagg cgcagctgtg ggcttcacac caatgacatc tctttcccac 26580 acttcctgcc ccttctggga ggctggggct caaatgccct gtgtgtctcc attccatagg 26640 gcccagtggg cttccgaagc cgccagccag gactgtggga aggagagggc catacagagc 26700 gctcacacct tcacccacaa atcgggtggg cactgttctc cccaacagga agctgggcct 26760 cgagagagcc taaggacagt tgccaggagt ccatgcagca gggttcaggg ctggggtctg 26820 ggccccagca ccctctttac tgcacagact ggataactga tgatacatgg ctgatctcac 26880 tttggggagt gaaaggaggc actaggaata gatgtcaact ggaaccctca ggcaaaatgg 26940 atgtcagttc atcctaccgg gatagggccc cgtcatggtt ccatcctgga aggcacaggc 27000 tggctctgtg agcccaggag gcagggtcag gccccctgga tgggaagcta cagaggtcag 27060 acccagcctg gtagtgggat ggcagctatt gggactggtg gcccacgaga tggacagact 27120 cctctggggc cagtcccaca tcctcctgtt cagggctcca ttgagtgcac acgacttggc 27180 ccagagcagg cacctaggat tgcaggtcaa atgggactgc agtgcccaag gacaacagag 27240 gcaggaaggc ttcctggagg gaggggccct ggggccctca ttctggctca cccacagagt 27300 cagagtcact gtcagaggag ctgcagaacg ctgggctgct ggagcacatc ctggagcacc 27360 tcaacagctc cctcaaaagc agggacgtct gcgccagcgg cctgggcctg ctctgggccc 27420 tcctgctgga cggtgagggg ccctcctcct gctgtcccac cggggctggc agccctcccc 27480 cagcccctcc ctaactgccc ctgagagcct tcgaggacct ccatgtcctg tccctaaaac 27540 acaacagcca tagtccggga aaggctcttc tgagagcttc caactccaac agaagaaaat 27600 caaggagcag agagagaaaa ggcaggggag aaaggccttc tggcagaggc cgggtttcag 27660 gacttcttgc ccagtgggca gacccctcag ttttaagtgc ctcctgccca gggaaatgtc 27720 ctgggatttt ccgggcagtc ctggttccag agggcagcgg tggtgctgga tgccctgttt 27780 gttttgattt ttgattcaca gtagggggcc ccctggcctg tgctgcttcc tctcctctag 27840 accccatctt ggcactccag cgccccagga aaaagagagc tccaaaccac ggaaagcccg 27900 ggaaacccaa gaaccctgcc agcacccaaa gtgtgggatt ctccaagcct ctcctgggct 27960 aacccctgca cccgtctctg aggacagttg acctttccca ccccatttct gctgttgtcg 28020 ttagctggag gaaggcagca gatgggggat gggaaggccc ccctgcacac acccaaggcc 28080 tgggtgtccc cttccatccc tgtcctcgtt ccaggtatca ttgtgaacaa ggcccccttg 28140 gagaaggtcc cggacctcat cagccaggtg ttggccacct accctgcgga tggggaaatg 28200 gcggaagcca gctgcggagt cttctggctg ctgtccctgc tgggtgagct ggatgggcgc 28260 cctgggcccc tggggctggg aggggtgggc ctcatggcac agcaggcaca aggcagcccg 28320 gcccctttct gcaggctgca tcaaggagca gcagtttgaa caagtggtgg cgctgctcct 28380 gcaaagcatc cggctgtgcc aggacagagc cctgctggtg aacaatgcct accggggact 28440 ggccagcctg gtgaaggtgt caggtgagcc tggggacagg acgaggctgc cacctagagg 28500 tgggggcaag aatcagcccc catcagttac atctgccagg tgccacaaac caaaaaacag 28560 aagcaacaaa tcaaaaagga aaagaaatta aaaacgatct gaagtccagt catccagaaa 28620 tcaccatcaa gactttcacg cacacttgat aaactcttgt ctctgcgtta tgctaccctg 28680 tgaccctctc tctgtccata aacacatcac atctgcacag gtttcctaac atgcaggcac 28740 accgtgactg atcaaaacag ctctgcaaac agtgtctcca attccccaca acacaaaccc 28800 tgcctgttac tcagctagac aggctgggcc cagcgctgag cacagcacaa ccgacgctcg 28860 gcccacagca cagtccttca gagagcatcc tgggcctggc caagacacta gctggtgcct 28920 ggcaactccg ggctcatggt cttgacctct gtactaccta atcttcccca gagtgacaac 28980 gacccctttg gctctggggg ggctgcctcc tctgttcttg catggtcctg ttcaggtcat 29040 gccagcctac tggtccgccc aagctgatgg ggcctcctgg gtcccgtctc ttgtcctgtc 29100 ccaggcccct gtgtgagctc tggggtccca tcccgtcctg ggcagaaggc tcttcccttt 29160 caggggaaag cagggaatga acccactccc acccatcccc cagagctggc ggccttcaag 29220 gtggtggtgc aggaggaggg cggcagtggc ctcagcctca tcaaggagac ctaccagctc 29280 cacagggacg acccggaggt ggtggagaac gtgggcatgc tgctggtcca cctggcttcc 29340 tatggtgaga accccttctc acctcacact ccctagagcc cagcggtcag gggtgccccg 29400 ctccccctat aactgacagg gaaggagcac atggaaggtg ggctcaaccc cacttctcgg 29460 cccacttaaa cttcccactc atttggcatc ttctgagcac caggggttgt cctggctgag 29520 ggtgacgctt ggggctccgg aactgcaagg tggctctgtg catgccaagc ccaaggggga 29580 atgtgaccca ctctcatcct tctggggctt ctggcaaggg gcacaggaag gactctggcc 29640 tcaggacctt cctgctccac ctgcagagga gatcctgccg gagctggtgt ccagtagtat 29700 gaaggccctg ctccaggaga tcaaggagcg cttcacctcc agcctggtga gtgacagcag 29760 cgccttcagc aaaccaggcc tccctccagg tggaagcccc cagctggggt gcaccacgtc 29820 tgggggactg gaatag 29836 4 2553 DNA Homo sapiens 4 gttttgtacc agctgaatcc tggggccttg ggggtgaacc tggtggtgga ggaaatggaa 60 accaaagtca agcatgtgat aaagcagctc ttcccatgtt ggagaagctt cctgatgcgg 120 cagctggatt cctcgctgct gacacttgcg gagactaatc tggttggggt agatgtgggg 180 gtggaatgca tggatgacca ttacgccagt caggccctgg aggagctgat gccactgctg 240 aagctgcggc acgcccacat ctctgtgtac caggagctgt tcatcacgtg gaatggggag 300 atctcttctc tgtacctctg cctggtgatg gagttcaatg agctcagctt ccaggaggtc 360 attgaggata agaggaaggc aaagaaaatc attgactctg agtggatgca gaatgtgctg 420 ggccaggtgc tggacgcgct ggaatacctg caccatttgg acatcatcca caggaatctc 480 aaaccctcca acatcatcct catcagcagt gaccactgca aactgcagga cctgagttcc 540 aatgtgctaa tgacagacaa agccaaatgg aatattcgtg cggaggaagg gcagaggcag 600 ggccccacag acacccaaca tttgagagaa acaaagtcgt ggttgtttgt ggtaccccag 660 aaaatgttgc ctctcatgga gggaaaagaa agtgtcagaa ggaaggatat gaaaatgccc 720 aggacggagg gagacccctt tcgtaagtcc tggatggccc ctgaagccct caacttctcc 780 ttcagccaga aatcagacat ctggtccctg ggctgcatca ttctggacat gaccagctgc 840 tccttcatgg atggcacaga agccatgcat ctgcggaagt ccctccgcca gagcccaggc 900 agcctgaagg ccgtcctgaa gacaatggag gagaagcaga tcccggatgt ggaaaccttc 960 aggaatcttc tgcccttgat gctccagatc gacccctcgg atcgaataac gataaaggac 1020 gtggtgcaca tcaccttctt gagaggctcc ttcaagtcct cgtgcgtctc tctgaccctg 1080 caccggcaga tggtgcctgc gtccatcacc gacatgctgt tagaaggcaa cgtggccagc 1140 attttaggtg atgctgggga cacaaagggg gagcgtgccc tgaagctcct gtccatggcc 1200 ttggcatcct attgtttagt tccagagggt tcattattta tgcccctggc cttgctccac 1260 atgcacgacc agtggctcag ctgtgaccag gacagagtcc ctgggaagag agactttgcc 1320 tccctgggga aactagggaa gctgttgggc cccatcccaa agggtctgcc gtggcccccg 1380 gagctggtgg aggtggtggt cacgaccatg gagctacatg acagggtcct cgatgtccag 1440 ctgtgtgcct gctccctgct gctgcacctc ctgggccaag gcctgccttt tgcctgctcc 1500 gtggccctgg acaagttcct gatgatcctg ccagttttcc cagctatgaa gcgaggagct 1560 ggacacgagg tcctctggag tcaccctcag ggaggatggg ttgtgtcctc tgaagagggc 1620 tgcgctggtg caccacccgg aagccaaggc tccctgcaac caagccatca cctccaccct 1680 gctgagtgct cttcagagcc accccgagga ggagccactt cttgtcatgg tctacagcct 1740 gctagccatc accacaaccc aggggcccag tgggcttccg aagccgccag ccaggactgt 1800 gggaaggaga gggccataca gagcgctcac accttcaccc acaaatcgga gtcagagtca 1860 ctgtcagagg agctgcagaa cgctgggctg ctggagcaca tcctggagca cctcaacagc 1920 tccctcaaaa gcagggacgt ctgcgccagc ggcctgggcc tgctctgggc cctcctgctg 1980 gacgacccca tcttggcact ccagcgcccc aggaaaaaga gagctccaaa ccacggaaag 2040 cccgggaaac ccaagaaccc tgccagcacc caaagtatca ttgtgaacaa ggcccccttg 2100 gagaaggtcc cggacctcat cagccaggtg ttggccacct accctgcgga tggggaaatg 2160 gcagaagcca gctgcggagt cttctggctg ctgtccctgc tgggctgcat caaggagcag 2220 cagtttgaac aagtggtggc gctgctcctg caaagcatcc ggctgtgcca ggacagagcc 2280 ctgctggtga acaatgccta ccggggactg gccagcctgg tgaaggtgtc agagctggcg 2340 gccttcaagg tggtggtgca ggaggagggc ggcagtggcc tcagcctcat caaggagacc 2400 taccagctcc acagggacga cccggaggtg gtggagaacg tgggcatgct gctggtccac 2460 ctggcttcct atgaggagat cctgccggag ctggtgtcca gtagtatgaa ggccctgctc 2520 caggagatca aggagcgctt cacctccagc ctg 2553 5 2115 DNA Homo sapiens 5 gaggtggtgg ctgtgcagat gatggtggaa tgcatggatg accattacgc cagtcaggcc 60 ctggaggagc tgatgccact gctgaagctg cggcacgccc acatctctgt gtaccaggag 120 ctgttcatca cgtggaatgg ggagatctct tctctgtacc tctgcctggt gatggagttc 180 aatgagctca gcttccagga ggtcattgag gataagagga aggcaaagaa aatcattgac 240 tctgagtgga tgcagaatgt gctgggccag gtgctggacg cgctggaata cctgcaccat 300 ttggacatca tccacaggaa tctcaaaccc tccaacatca tcctcatcag cagtgaccac 360 tgcaaactgc aggacctgag ttccaatgtg ctaatgacag acaaagccaa atggaatatt 420 cgtgcggagg aagacccctt tcgtaagtcc tggatggccc ctgaagccct caacttctcc 480 ttcagccaga aatcagacat ctggtccctg ggctgcatca ttctggacat gaccagctgc 540 tccttcatgg atggcacaga agccatgcat ctgcggaagt ccctccgcca gagcccaggc 600 agcctgaagg ccgtcctgaa gacaatggag gagaagcaga tcccggatgt ggaaaccttc 660 aggaatcttc tgcccttgat gctccagatc gacccctcgg atcgaataac gataaaggac 720 gtggtgcaca tcaccttctt gagaggctcc ttcaagtcct cgtgcgtctc tctgaccctg 780 caccggcaga tggtgcctgc gtccatcacc gacatgctgt tagaaggcaa cgtggccagc 840 attttaggtg atgctgggga cacaaagggg gagcgtgccc tgaagctcct gtccatggcc 900 ttggcatcct attgtttagt tccagagggt tcattattta tgcccctggc cttgctccac 960 atgcacgacc agtggctcag ctgtgaccag gacagagtcc ctgggaagag agactttgcc 1020 tccctgggga aactagggaa gctgttgggc cccatcccaa agggtctgcc gtggcccccg 1080 gagctggtgg aggtggtggt cacgaccatg gagctacatg acagggtcct cgatgtccag 1140 ctgtgtgcct gctccctgct gctgcacctc ctgggccaag cgctggtgca ccacccggaa 1200 gccaaggctc cctgcaacca agccatcacc tccaccctgc tgagtgctct tcagagccac 1260 cccgaggagg agccacttct tgtcatggtc tacagcctgc tagccatcac cacaacccag 1320 gagtcagagt cactgtcaga ggagctgcag aacgctgggc tgctggagca catcctggag 1380 cacctcaaca gctccctcga aagcagggac gtctgcgcca gcggcctggg cctgctctgg 1440 gccctcctgc tggacgaccc catcttggca ctccagcgcc ccaggaaaaa gagagctcca 1500 aaccacggaa agcccgggaa acccaagaac cctgccagca cccaaagtat cattgtgaac 1560 aaggccccct tggagaaggt cccggacctc atcagccagg tgttggccac ctaccctgcg 1620 gatggggaaa tggcagaagc cagctgcgga gtcttctggc tgctgtccct gctgggctgc 1680 atcaaggagc agcagtttga acaagtggtg gcgctgctcc tgcaaagcat ccggctgtgc 1740 caggacagag ccctgctggt gaacaatgcc taccggggac tggccagcct ggtgaaggtg 1800 tcagagctgg cggccttcaa ggtggtggtg caggaggagg gcggcagtgg cctcagcctc 1860 atcaaggaga cctaccagct ccacagggac gacccggagg tggtggagaa cgtgggcatg 1920 ctgctggtcc acctggcttc ctatgaggag atcctgccgg agctggtgtc cagtagtatg 1980 aaggccctgc tccaggagat caaggagcgc ttcacctcca gcctggtgag tgacagcagc 2040 gccttcagca aaccaggcct ccctccaggt ggaagccccc agctggggtg caccacgtct 2100 gggggactgg aatag 2115 6 704 PRT Homo sapiens 6 Glu Val Val Ala Val Gln Met Met Val Glu Cys Met Asp Asp His Tyr 1 5 10 15 Ala Ser Gln Ala Leu Glu Glu Leu Met Pro Leu Leu Lys Leu Arg His 20 25 30 Ala His Ile Ser Val Tyr Gln Glu Leu Phe Ile Thr Trp Asn Gly Glu 35 40 45 Ile Ser Ser Leu Tyr Leu Cys Leu Val Met Glu Phe Asn Glu Leu Ser 50 55 60 Phe Gln Glu Val Ile Glu Asp Lys Arg Lys Ala Lys Lys Ile Ile Asp 65 70 75 80 Ser Glu Trp Met Gln Asn Val Leu Gly Gln Val Leu Asp Ala Leu Glu 85 90 95 Tyr Leu His His Leu Asp Ile Ile His Arg Asn Leu Lys Pro Ser Asn 100 105 110 Ile Ile Leu Ile Ser Ser Asp His Cys Lys Leu Gln Asp Leu Ser Ser 115 120 125 Asn Val Leu Met Thr Asp Lys Ala Lys Trp Asn Ile Arg Ala Glu Glu 130 135 140 Asp Pro Phe Arg Lys Ser Trp Met Ala Pro Glu Ala Leu Asn Phe Ser 145 150 155 160 Phe Ser Gln Lys Ser Asp Ile Trp Ser Leu Gly Cys Ile Ile Leu Asp 165 170 175 Met Thr Ser Cys Ser Phe Met Asp Gly Thr Glu Ala Met His Leu Arg 180 185 190 Lys Ser Leu Arg Gln Ser Pro Gly Ser Leu Lys Ala Val Leu Lys Thr 195 200 205 Met Glu Glu Lys Gln Ile Pro Asp Val Glu Thr Phe Arg Asn Leu Leu 210 215 220 Pro Leu Met Leu Gln Ile Asp Pro Ser Asp Arg Ile Thr Ile Lys Asp 225 230 235 240 Val Val His Ile Thr Phe Leu Arg Gly Ser Phe Lys Ser Ser Cys Val 245 250 255 Ser Leu Thr Leu His Arg Gln Met Val Pro Ala Ser Ile Thr Asp Met 260 265 270 Leu Leu Glu Gly Asn Val Ala Ser Ile Leu Gly Asp Ala Gly Asp Thr 275 280 285 Lys Gly Glu Arg Ala Leu Lys Leu Leu Ser Met Ala Leu Ala Ser Tyr 290 295 300 Cys Leu Val Pro Glu Gly Ser Leu Phe Met Pro Leu Ala Leu Leu His 305 310 315 320 Met His Asp Gln Trp Leu Ser Cys Asp Gln Asp Arg Val Pro Gly Lys 325 330 335 Arg Asp Phe Ala Ser Leu Gly Lys Leu Gly Lys Leu Leu Gly Pro Ile 340 345 350 Pro Lys Gly Leu Pro Trp Pro Pro Glu Leu Val Glu Val Val Val Thr 355 360 365 Thr Met Glu Leu His Asp Arg Val Leu Asp Val Gln Leu Cys Ala Cys 370 375 380 Ser Leu Leu Leu His Leu Leu Gly Gln Ala Leu Val His His Pro Glu 385 390 395 400 Ala Lys Ala Pro Cys Asn Gln Ala Ile Thr Ser Thr Leu Leu Ser Ala 405 410 415 Leu Gln Ser His Pro Glu Glu Glu Pro Leu Leu Val Met Val Tyr Ser 420 425 430 Leu Leu Ala Ile Thr Thr Thr Gln Glu Ser Glu Ser Leu Ser Glu Glu 435 440 445 Leu Gln Asn Ala Gly Leu Leu Glu His Ile Leu Glu His Leu Asn Ser 450 455 460 Ser Leu Glu Ser Arg Asp Val Cys Ala Ser Gly Leu Gly Leu Leu Trp 465 470 475 480 Ala Leu Leu Leu Asp Asp Pro Ile Leu Ala Leu Gln Arg Pro Arg Lys 485 490 495 Lys Arg Ala Pro Asn His Gly Lys Pro Gly Lys Pro Lys Asn Pro Ala 500 505 510 Ser Thr Gln Ser Ile Ile Val Asn Lys Ala Pro Leu Glu Lys Val Pro 515 520 525 Asp Leu Ile Ser Gln Val Leu Ala Thr Tyr Pro Ala Asp Gly Glu Met 530 535 540 Ala Glu Ala Ser Cys Gly Val Phe Trp Leu Leu Ser Leu Leu Gly Cys 545 550 555 560 Ile Lys Glu Gln Gln Phe Glu Gln Val Val Ala Leu Leu Leu Gln Ser 565 570 575 Ile Arg Leu Cys Gln Asp Arg Ala Leu Leu Val Asn Asn Ala Tyr Arg 580 585 590 Gly Leu Ala Ser Leu Val Lys Val Ser Glu Leu Ala Ala Phe Lys Val 595 600 605 Val Val Gln Glu Glu Gly Gly Ser Gly Leu Ser Leu Ile Lys Glu Thr 610 615 620 Tyr Gln Leu His Arg Asp Asp Pro Glu Val Val Glu Asn Val Gly Met 625 630 635 640 Leu Leu Val His Leu Ala Ser Tyr Glu Glu Ile Leu Pro Glu Leu Val 645 650 655 Ser Ser Ser Met Lys Ala Leu Leu Gln Glu Ile Lys Glu Arg Phe Thr 660 665 670 Ser Ser Leu Val Ser Asp Ser Ser Ala Phe Ser Lys Pro Gly Leu Pro 675 680 685 Pro Gly Gly Ser Pro Gln Leu Gly Cys Thr Thr Ser Gly Gly Leu Glu 690 695 700 7 21 DNA Homo sapiens 7 aatggaatat tcgtgcggag g 21 8 21 RNA Homo sapiens 8 uggaauauuc gugcggaggu u 21 9 21 RNA Homo sapiens 9 uuaccuuaua agcacgccuc c 21 10 21 DNA Homo sapiens 10 aatattcgtg cggaggaaga c 21 11 21 RNA Homo sapiens 11 uauucgugcg gaggaagacu u 21 12 21 RNA Homo sapiens 12 uuauaagcac gccuccuucu g 21 13 21 DNA Homo sapiens 13 aagttcctga tgatcctgcc a 21 14 21 RNA Homo sapiens 14 guuccugaug auccugccau u 21 15 21 RNA Homo sapiens 15 uucaaggacu acuaggacgg u 21 16 21 DNA Homo sapiens 16 catcaccttc ttgagaggct c 21 17 21 RNA Homo sapiens 17 ucaccuucuu gagaggcucu u 21 18 21 RNA Homo sapiens 18 uuaguggaag aacucuccga g 21 19 21 DNA Homo sapiens 19 caagttcctg atgatcctgc c 21 20 21 RNA Homo sapiens 20 aguuccugau gauccugccu u 21 21 21 RNA Homo sapiens 21 uuucaaggac uacuaggacg g 21 22 21 DNA Homo sapiens 22 gatgaccatt acgccagtca g 21 23 21 RNA Homo sapiens 23 ugaccauuac gccagucagu u 21 24 21 RNA Homo sapiens 24 uuacugguaa ugcggucagu c 21 25 21 DNA Homo sapiens 25 gaatattcgt gcggaggaag a 21 26 21 RNA Homo sapiens 26 auauucgugc ggaggaagau u 21 27 21 RNA Homo sapiens 27 uuuauaagca cgccuccuuc u 21 28 21 DNA Homo sapiens 28 gaaggccgtc ctgaagacaa t 21 29 21 RNA Homo sapiens 29 aggccguccu gaagacaauu u 21 30 21 RNA Homo sapiens 30 uuuccggcag gacuucuguu a 21 

What is claimed is:
 1. An isolated polynucleotide comprising a nucleic acid sequence which encodes the amino acid sequence depicted in SEQ ID NO:2.
 2. The polynucleotide according to claim 1, wherein the nucleic acid sequence is selected from the group consisting of: (a) the nucleic acid sequence as shown in SEQ ID NO:1; (b) the complement of (a); and (c) a nucleic acid sequence that differs from (a) or (b) due to the degeneracy of the genetic code.
 3. The polynucleotide according to claim 1, wherein the nucleic acid sequence is selected from the group consisting of: (a) the nucleic acid sequence as shown in SEQ ID NO:3; (b) the complement of (a); and (c) a nucleic acid sequence that differs from (a) or (b) due to the degeneracy of the genetic code.
 4. An isolated polynucleotide comprising a variant of a nucleic acid sequence, wherein said nucleic acid sequence encodes the amino acid sequence depicted in SEQ ID NO:2, and wherein the variant and said nucleic acid sequence have at least 91% sequence identity.
 5. The polynucleotide according to claim 4, wherein the variant and said nucleic acid sequence have at least 95% sequence identity.
 6. An isolated polynucleotide that hybridizes under stringent conditions to a polynucleotide consisting of the nucleotide sequence of SEQ ID NO:1 or the complement thereof, wherein said polynucleotide consists of at least 1000 nuclei acids and does not include the nucleotide sequences of SEQ ID NOS: 4-5 or the complement thereof.
 7. An isolated polynucleotide that hybridizes under highly stringent conditions to a polynucleotide consisting of the nucleotide sequence of SEQ ID NO:1 or the complement thereof, said polynucleotide consists of at least 2000 nuclei acids and encodes a protein kinase.
 8. An isolated polypeptide comprising a fragment of SEQ ID NO:2, wherein said fragment comprising at least 500 consecutive amino acid residues of SEQ ID NO:2.
 9. The polypeptide according to claim 8, wherein the fragment consists of SEQ ID NO:2.
 10. An isolated polypeptide comprising a variant of a fragment of SEQ ID NO:2, wherein said fragment includes at least 500 consecutive amino acid residues of SEQ ID NO:2.
 11. The polypeptide according to claim 10, wherein the variant and said fragment have at least 95% sequence identity.
 12. An antibody capable of binding to the amino acid sequence depicted in SEQ ID NO:2 with a binding affinity of no less than 10⁵ M⁻¹.
 13. An NRHK1 detection kit comprising: (a) the antibody of claim 12, or (b) a probe that hybridizes to the nucleotide sequence of SEQ ID NO:1 or the complement thereof.
 14. A host cell containing the polynucleotide of claim 1 or a variant thereof.
 15. A transgenic non-human animal comprising the polynucleotide of claim 1 or a variant thereof.
 16. A non-human animal, wherein at least one allele of a gene in the genome of said animal is functionally disrupted, and wherein said gene encodes a polypeptide that has at least 70% sequence identity to SEQ ID NO:2.
 17. A method for identifying an agent capable of binding to NRHK1 kinase, comprising contacting a candidate agent with a polypeptide comprising: (a) an amino acid sequence recited in SEQ ID NO:2, (b) a fragment of SEQ ID NO:2, or (c) a variant of (a) or (b); and detecting the binding between said candidate agent and said polypeptide.
 18. A method for identifying an agent capable of modulating the level of activity of NRHK1 kinase, comprising: contacting a candidate agent with an polypeptide comprising: (a) an amino acid sequence recited in SEQ ID NO:2, or (b) a biologically active portion of SEQ ID NO:2; and detecting a change in the level of an activity of said polypeptide.
 19. A pharmaceutical composition for preventing or treating NRHK1-related diseases, comprising a pharmaceutically acceptable carrier and an agent that modulates an NRHK1 activity or the NRHK1 gene expression.
 20. A method for preventing or treating an NRHK1-related disease in a subject, comprising the step of: introducing into the subject an effective amount of the pharmaceutical composition of claim
 19. 21. A polynucleotide capable of inhibiting human NRHK1 gene expression by RNA interference.
 22. The polynucleotide according to claim 21, comprising a siRNA sense strand or a siRNA antisense strand selected from Table
 4. 23. A method, comprising introducing a polynucleotide of claim 21 into a cell which expresses human NRHK1 gene, thereby inhibiting the expression of said gene in said cell by RNA interference. 